recipe-recommender
General↓ 0 installsUpdated 19d ago
Recommend and customize Megatron Bridge recipes for a user's model, GPU count, and training goal. Indexes library recipes (pretrain/SFT/PEFT) and performance recipes.
SKILL.md preview
---
name: recipe-recommender
description: Recommend and customize Megatron Bridge recipes for a user's model, GPU count, and training goal. Indexes library recipes (pretrain/SFT/PEFT) and performance recipes.
when_to_use: User wants a starting recipe or training config; 'which recipe', 'recommend recipe', 'how to train Llama', 'starting config for X GPUs', 'what recipe for SFT'.
---
# Auto Recipe — Recipe Index & Recommendation
This skill indexes every shipped recipe and helps users pick the right starting
config, adjust parallelism, and avoid common pitfalls.
## How to Use This Skill
1. Ask the user for: **model name/size**, **GPU count & type**, **training goal**
(pretrain / SFT / PEFT), and **sequence length** (if non-default).
2. Look up the best-match recipe in the index below.
3. Recommend the recipe function name + entry-point command.
4. Provide adjustment advice (parallelism resizing, batch tuning, pitfalls).
---
## Entry Points
### Library recipes (functional training)
```bash
# Pretrain with mock data
uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
--recipe <recipe_function_name> \
--dataset llm-pretrain-mock
# SFT with SQuAD
uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
--recipe <recipe_function_name> \
--dataset llm-finetune
# Override any field via CLI
uv run python -m torch.distributed.run --nproc_per_node=8 scripts/training/run_recipe.py \
--recipe llama3_8b_pretrain_config \
--dataset llm-pretrain-mock \
'model.tensor_model_parallel_size=2' \
'training.global_batch_size=64'
```
### Performance recipes (throughput benchmarks)
```bash
python scripts/performance/run_script.py \
--recipe <model_family> \
--gpu_type h100 \
--num_gpus 64 \
--data mock
```
> **Perf recipes are NOT fully validated for correctness.** Most conversations
> and testing were on mock data. They are designed for **upper-bound throughput
> measurement**, not production training. Always validate loss curves and
> convergence independently.
---
## Recipe Unification (Coming Soon — PR #2803)
PR [#2803](https://github.com/NVIDIA-NeMo/Megatron-Bridge/pull/2803) is
unifying performance recipes into the same **Python function** format used by
library recipes. Key changes:
- Perf recipes move from `scripts/performance/configs/` → `src/megatron/bridge/recipes/<family>/<model>_perf.py`
- Each perf recipe becomes a **self-contained Python function** (e.g. `llama3_8b_h100_bf16_pretrain_config()`)
- The old `WorkloadBaseConfig` → `set_workload_base_configs` → `get_perf_optimized_recipe` pipeline is removed
- Shared helpers: `_benchmark_common()` (50 iters, timing, TE RNG), `_perf_precision()` (bf16 / fp8_cs / fp8_mx / nvfp4)
**Why Python, not YAML?** Previous YAML-based approaches had problems:
recipe logic was split across multiple indirection layers, configs were not
self-contained, and the two-level pipeline made maintenance and debugging
difficult. Python functions are explicit, greppable, and composable.
After #2803 lands, both library and perf recipes will be invocable through the
same `run_recipe.py` entry point.
---
## Library Recipe Index
All recipes live under `src/megatron/bridge/recipes/`. Each function returns a
`ConfigContainer` with model, training, optimizer, and data settings.
### Llama
| Recipe | Mode | TP | PP | CP | SP | GPUs (min) | Seq Len |
|--------|------|----|----|----|----|------------|---------|
| `llama2_7b_pretrain_config` | Pretrain | 2 | 1 | — | — | 2 | 4K |
| `llama3_8b_pretrain_config` | Pretrain | 2 | 1 | — | ✓ | 2 | 8K |
| `llama3_8b_16k_pretrain_config` | Pretrain | 2 | 1 | 2 | ✓ | 4 | 16K |
| `llama3_8b_64k_pretrain_config` | Pretrain | 2 | 1 | 4 | ✓ | 8 | 64K |
| `llama3_8b_128k_pretrain_config` | Pretrain | 2 | 1 | 8 | ✓ | 16 | 128K |
| `llama3_70b_pretrain_config` | Pretrain | 8 | 4 | — | ✓ | 32 | 8K |
| `llama3_70b_16k_pretrain_config` | Pretrain | 8 | 4
…