Skip to content

dlgforge.tools.retrieval

Knowledge retrieval and vector-store utilities.

KnowledgeVectorStore(knowledge_dir, collection_name='dlgforge_knowledge', chunk_size=750, overlap=150, embedding_model_name=None, persist_dir=None, rebuild_index=False, skip_if_unchanged=True)

Vector-store wrapper for indexing and retrieving knowledge passages.

Parameters:

Name Type Description Default
knowledge_dir Path

Path value used by this operation.

required
collection_name str

str value used by this operation.

'dlgforge_knowledge'
chunk_size int

int value used by this operation.

750
overlap int

int value used by this operation.

150
embedding_model_name Optional[str]

Optional[str] value used by this operation.

None
persist_dir Optional[Path]

Optional[Path] value used by this operation.

None
rebuild_index bool

bool value used by this operation.

False
skip_if_unchanged bool

bool value used by this operation.

True

Raises:

Type Description
Exception

Construction may raise when required dependencies or inputs are invalid.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Instantiate and use through documented public methods.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> KnowledgeVectorStore(...)

Similarity search.

Parameters:

Name Type Description Default
query str

Input text.

required
k int

Numeric control value for processing behavior.

required

Returns:

Type Description
List[Tuple[str, Dict[str, Any]]]

List[Tuple[str, Dict[str, Any]]]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.similarity_search(...)

similarity_search_with_ids(query, k)

Similarity search with ids.

Parameters:

Name Type Description Default
query str

Input text.

required
k int

Numeric control value for processing behavior.

required

Returns:

Type Description
List[Tuple[str, Dict[str, Any], str]]

List[Tuple[str, Dict[str, Any], str]]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.similarity_search_with_ids(...)

list_sources()

List sources.

Returns:

Type Description
List[str]

List[str]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.list_sources(...)

source_chunk_counts()

Source chunk counts.

Returns:

Type Description
Dict[str, int]

Dict[str, int]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.source_chunk_counts(...)

random_samples(n, exclude_ids=None, rng=None)

Random samples.

Parameters:

Name Type Description Default
n int

Numeric control value for processing behavior.

required
exclude_ids Optional[set[str]]

Optional[set[str]] value used by this operation.

None
rng Optional[Random]

Optional[random.Random] value used by this operation.

None

Returns:

Type Description
List[Tuple[str, Dict[str, Any], str]]

List[Tuple[str, Dict[str, Any], str]]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.random_samples(...)

sample_by_sources(sources, n, exclude_ids=None, rng=None)

Sample by sources.

Parameters:

Name Type Description Default
sources set[str]

set[str] value used by this operation.

required
n int

Numeric control value for processing behavior.

required
exclude_ids Optional[set[str]]

Optional[set[str]] value used by this operation.

None
rng Optional[Random]

Optional[random.Random] value used by this operation.

None

Returns:

Type Description
List[Tuple[str, Dict[str, Any], str]]

List[Tuple[str, Dict[str, Any], str]]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import KnowledgeVectorStore
>>> instance = KnowledgeVectorStore(...)
>>> instance.sample_by_sources(...)

configure_retrieval(cfg, project_root)

Configure retrieval.

Parameters:

Name Type Description Default
cfg Dict[str, Any]

Configuration mapping that controls runtime behavior.

required
project_root Path

Resolved project directory context.

required

Returns:

Name Type Description
None None

No value is returned.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import configure_retrieval
>>> configure_retrieval(...)

get_vector_store()

Get vector store.

Returns:

Name Type Description
KnowledgeVectorStore KnowledgeVectorStore

Value produced by this API.

Raises:

Type Description
RuntimeError

Raised when validation or runtime requirements are not met.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import get_vector_store
>>> get_vector_store(...)

Vector db search.

Parameters:

Name Type Description Default
query str

Input text.

required
k Optional[int]

Numeric control value for processing behavior.

None
use_reranker bool

bool value used by this operation.

False

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: Value produced by this API.

Raises:

Type Description
Exception

Propagates unexpected runtime errors from downstream calls.

Side Effects / I/O: - May read from or write to local filesystem artifacts.

Preconditions / Invariants: - Callers should provide arguments matching annotated types and expected data contracts.

Examples:

>>> from dlgforge.tools.retrieval import vector_db_search
>>> vector_db_search(...)