Hugging Face Export¶
Entry points¶
Export/push behavior lives in src/dlgforge/pipeline/hf_push.py:
- run_push(config_path, options) for explicit CLI push flow.
- maybe_auto_push_after_run(cfg, output_paths) for automatic post-run export/push.
Export phases¶
- Resolve effective source/export directories.
- Prepare export bundle from generated artifacts.
- Sanitize rows for Hub compatibility.
- Generate dataset card (and optionally stats/plots).
- Optionally push to Hub via
huggingface_hubAPI.
Key configuration¶
saving.hf_push.* controls:
- enable flags (enabled, auto_push_on_run)
- destination (repo_id, repo_type, private)
- source/export layout (source_file, export_dir, include_run_state)
- metadata and cleanup (commit_message, clean_remote)
- analytics outputs (generate_stats, stats_file, generate_plots, plots_dir)
CLI options override¶
dlgforge push options can override config for one-off runs:
- --repo-id, --repo-type, --source-dir, --export-dir
- --include-run-state, --token, --commit-message
- --no-export, --no-push, --clean-remote
Sanitization behavior¶
Export preparation normalizes content to avoid downstream schema inconsistencies, including: - reasoning-trace shape normalization - null/list normalization for retrieval fields - thinking trace normalization to text-friendly structures
These behaviors are covered by tests in tests/test_hf_push_sanitization.py.
Generated extra assets¶
When enabled, export can include: - dataset stats JSON summary - SVG plots under configured plot directory - dataset card with source file and metadata
Operational recommendations¶
- Use
--no-pushfirst to validate export bundle locally. - Keep source file aligned with desired judge granularity (
conversations_sharegpt_judged.jsonlfor judged exports). - Use explicit token input in CI where environment setup may vary.