dlgforge.tools.retrieval¶
Knowledge retrieval and vector-store utilities.
KnowledgeVectorStore(knowledge_dir, collection_name='dlgforge_knowledge', chunk_size=750, overlap=150, embedding_model_name=None, persist_dir=None, rebuild_index=False, skip_if_unchanged=True)
¶
Vector-store wrapper for indexing and retrieving knowledge passages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
knowledge_dir
|
Path
|
Path value used by this operation. |
required |
collection_name
|
str
|
str value used by this operation. |
'dlgforge_knowledge'
|
chunk_size
|
int
|
int value used by this operation. |
750
|
overlap
|
int
|
int value used by this operation. |
150
|
embedding_model_name
|
Optional[str]
|
Optional[str] value used by this operation. |
None
|
persist_dir
|
Optional[Path]
|
Optional[Path] value used by this operation. |
None
|
rebuild_index
|
bool
|
bool value used by this operation. |
False
|
skip_if_unchanged
|
bool
|
bool value used by this operation. |
True
|
Raises:
| Type | Description |
|---|---|
Exception
|
Construction may raise when required dependencies or inputs are invalid. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Instantiate and use through documented public methods.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> KnowledgeVectorStore(...)
similarity_search(query, k)
¶
Similarity search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Input text. |
required |
k
|
int
|
Numeric control value for processing behavior. |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[str, Dict[str, Any]]]
|
List[Tuple[str, Dict[str, Any]]]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.similarity_search(...)
similarity_search_with_ids(query, k)
¶
Similarity search with ids.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Input text. |
required |
k
|
int
|
Numeric control value for processing behavior. |
required |
Returns:
| Type | Description |
|---|---|
List[Tuple[str, Dict[str, Any], str]]
|
List[Tuple[str, Dict[str, Any], str]]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.similarity_search_with_ids(...)
list_sources()
¶
List sources.
Returns:
| Type | Description |
|---|---|
List[str]
|
List[str]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.list_sources(...)
source_chunk_counts()
¶
Source chunk counts.
Returns:
| Type | Description |
|---|---|
Dict[str, int]
|
Dict[str, int]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.source_chunk_counts(...)
random_samples(n, exclude_ids=None, rng=None)
¶
Random samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
Numeric control value for processing behavior. |
required |
exclude_ids
|
Optional[set[str]]
|
Optional[set[str]] value used by this operation. |
None
|
rng
|
Optional[Random]
|
Optional[random.Random] value used by this operation. |
None
|
Returns:
| Type | Description |
|---|---|
List[Tuple[str, Dict[str, Any], str]]
|
List[Tuple[str, Dict[str, Any], str]]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.random_samples(...)
sample_by_sources(sources, n, exclude_ids=None, rng=None)
¶
Sample by sources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sources
|
set[str]
|
set[str] value used by this operation. |
required |
n
|
int
|
Numeric control value for processing behavior. |
required |
exclude_ids
|
Optional[set[str]]
|
Optional[set[str]] value used by this operation. |
None
|
rng
|
Optional[Random]
|
Optional[random.Random] value used by this operation. |
None
|
Returns:
| Type | Description |
|---|---|
List[Tuple[str, Dict[str, Any], str]]
|
List[Tuple[str, Dict[str, Any], str]]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.sample_by_sources(...)
configure_retrieval(cfg, project_root)
¶
Configure retrieval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
Dict[str, Any]
|
Configuration mapping that controls runtime behavior. |
required |
project_root
|
Path
|
Resolved project directory context. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
No value is returned. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import configure_retrieval
>>> configure_retrieval(...)
get_vector_store()
¶
Get vector store.
Returns:
| Name | Type | Description |
|---|---|---|
KnowledgeVectorStore |
KnowledgeVectorStore
|
Value produced by this API. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
Raised when validation or runtime requirements are not met. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import get_vector_store
>>> get_vector_store(...)
vector_db_search(query, k=None, use_reranker=False)
¶
Vector db search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Input text. |
required |
k
|
Optional[int]
|
Numeric control value for processing behavior. |
None
|
use_reranker
|
bool
|
bool value used by this operation. |
False
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict[str, Any]: Value produced by this API. |
Raises:
| Type | Description |
|---|---|
Exception
|
Propagates unexpected runtime errors from downstream calls. |
Side Effects / I/O: - May read from or write to local filesystem artifacts.
Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.
Examples:
>>> from dlgforge.tools.retrieval import vector_db_search
>>> vector_db_search(...)