DialogForge API Index (Curated)¶
Purpose¶
This index documents the stable, practical API surface for operators and contributors. It focuses on module responsibilities, public entrypoints, side effects, and call relationships.
External entrypoints¶
CLI (src/dlgforge/cli.py)¶
build_parser() -> argparse.ArgumentParsermain() -> None
Commands routed by main:
- run -> pipeline.runner.run
- judge -> pipeline.runner.run_judge_only
- push -> pipeline.hf_push.run_push
- seeds-migrate -> pipeline.seed_topics_migration.run_seeds_migrate
Package exports (src/dlgforge/__init__.py)¶
runrun_judge_onlyrun_pushrun_seeds_migrateHFPushOptions
Configuration API¶
src/dlgforge/config/defaults.py¶
DEFAULT_CONFIG: canonical default schema baseline.
src/dlgforge/config/loader.py¶
Public resolver contract:
- load_config
- build_base_inputs
- resolve_output_dir
- resolve_output_columns
- resolve_target_languages
- resolve_n_turns, resolve_turn_range, resolve_turn_count_distribution, resolve_turn_count_mean
- resolve_total_samples, resolve_batch_size
- resolve_distributed_enabled
- resolve_retrieval_default_k
- resolve_seed_topics_path, resolve_seed_topics_variant, resolve_seed_topics_probability, resolve_seed_topics_enabled
- resolve_judge_mode, resolve_judge_enabled, resolve_judge_granularity, resolve_judge_reasons
Side effects:
- file read (config.yaml)
- environment override application into in-memory config object
src/dlgforge/config/personas.py¶
Public contract:
- UniformPersonaSampler.sample
- select_personas
- build_uniform_persona_sampler
- load_personas
- persona resolver helpers (resolve_personas_enabled, resolve_personas_path, resolve_question_seed)
Side effects: - persona YAML file reads with fallback path handling
Pipeline API¶
src/dlgforge/pipeline/runner.py¶
Primary orchestration entrypoints:
- run(config_path)
- run_judge_only(config_path)
Advanced orchestration helpers used internally and by tests:
- run_multi_turn
- run_multi_turn_batched_async
- persist_training_sample
- persist_batched_training_samples
Side effects: - network/model calls - retrieval and optional web tool invocations - output artifact writes - run-state checkpoint writes - optional distributed bootstrap handoff
src/dlgforge/pipeline/sampling.py¶
Sampling and memory contracts:
- build_question_inputs
- select_question_mode
- sample_topic_snippets
- update_coverage_ledger
- is_duplicate_question
- update_doc_question_memory
- seed-topic helpers (load_seed_topics, select_seed_candidate, update_seed_memory)
Side effects: - coverage ledger writes through IO layer - seed-topic file reads
src/dlgforge/pipeline/state.py¶
Run-state contract:
- init_run_state, checkpoint_run_state
- init_batched_run_state, checkpoint_batched_run_state
- build_initial_batched_conversations, load_batched_conversations_from_state
Side effects: - run-state JSON writes/reads
src/dlgforge/pipeline/history.py¶
History formatting contract:
- format_history
- build_conversation_history
- build_public_history
- messages_up_to_turn
src/dlgforge/pipeline/prompts.py¶
Prompt loading/rendering contract:
- load_agents_config
- load_tasks_config
- build_agent_system_prompt
- build_task_prompt
Side effects: - prompt YAML file reads with LRU caching
src/dlgforge/pipeline/dedup.py¶
Dedup contract:
- normalize_question
- RunQuestionRegistry.filter_and_commit
src/dlgforge/pipeline/seed_topics_migration.py¶
Seed migration contract:
- run_seeds_migrate
- migrate_seed_topics_file
Side effects: - seed file read/write and destination creation
src/dlgforge/pipeline/hf_push.py¶
Export/push contract:
- dataclasses: HFPushSettings, HFPushOptions
- run_push
- maybe_auto_push_after_run
- resolve_hf_push_settings
- prepare_export
- push_to_hub
Side effects: - export directory file writes - optional Hugging Face network push
LLM API¶
src/dlgforge/llm/settings.py¶
resolve_llm_settingsresolve_agent_used_namerequired_agentsmissing_models
src/dlgforge/llm/client.py¶
- dataclass
ChatResult - client
OpenAIModelClient.complete - client
OpenAIModelClient.acomplete
Side effects: - outbound API calls - routing state mutation for weighted least-inflight logic
Tooling API¶
src/dlgforge/tools/retrieval.py¶
- class
KnowledgeVectorStorewith retrieval/listing/sample methods configure_retrievalget_vector_storevector_db_search
Side effects: - index build/read on filesystem - embedding model initialization
src/dlgforge/tools/web_search.py¶
SerperWebSearchClient.search
Side effects: - outbound HTTP requests to Serper endpoint
IO and utility API¶
src/dlgforge/io/output.py¶
Public output contract:
- OutputPaths
- configure_output_columns
- ensure_output_layout
- load_coverage_ledger, append_coverage_ledger
- save_run_state, load_run_state
- save_training_sample
- append_sharegpt_judged_record
src/dlgforge/utils/*.py¶
- env parsing:
env_flag,env_int,env_float,load_dotenv_files - JSON parsing:
strip_code_fences,extract_json_object,parse_json_object - logging setup:
setup_logging - config/path merges:
deep_merge,resolve_path - text helpers:
hash_text,render_template
Distributed runtime API¶
src/dlgforge/distributed/bootstrap.py¶
- class
RunBootstrap.run - function
run_bootstrap
src/dlgforge/distributed/provisioning.py¶
- protocol
VLLMProvisioner - implementations:
NoopProvisioner,AttachProvisioner,ManagedRayVLLMProvisioner
src/dlgforge/distributed/ray_runtime.py¶
import_raycreate_worker_actorcreate_coordinator_actorcreate_vllm_server_actor
src/dlgforge/distributed/types.py¶
- dataclass
EndpointSpec EndpointSpec.to_routing_dict
Cross-module call relationships¶
cli->pipeline.runner/pipeline.hf_push/pipeline.seed_topics_migrationpipeline.runner->config,pipeline.sampling,pipeline.prompts,llm,tools,io,pipeline.state,pipeline.hf_pushpipeline.sampling->tools.retrieval,io.outputdistributed.bootstrap->distributed.provisioning+distributed.ray_runtime->pipeline.runnerpipeline.hf_push->config+io+ optional Hub API
Package export map (__all__)¶
dlgforge¶
run,run_judge_only,run_push,run_seeds_migrate,HFPushOptions
dlgforge.config¶
DEFAULT_CONFIG,load_config,build_base_inputs, resolver helpers
dlgforge.pipeline¶
run,run_judge_only,run_push,run_seeds_migrate,HFPushOptions
dlgforge.llm¶
ChatResult,OpenAIModelClient, settings helpers
dlgforge.io¶
OutputPathsand output persistence helpers
dlgforge.tools¶
- retrieval config/search helpers and
SerperWebSearchClient
dlgforge.distributed¶
- bootstrap/provisioning types and helpers
dlgforge.utils¶
- env/json/logging/merge/text helper functions