live1,247 agents deployedbuilt by a solo devpowered by hermes
← All skillsSign up to install

trtllm-moe-develop

General0 installsUpdated 19d ago
VerifiedCuratedNVIDIA

>-

SKILL.md preview

---
name: trtllm-moe-develop
description: >-
  Review, design, and refactor TensorRT-LLM PyTorch MoE code for architecture fit,
  clean code, maintainability, and testability. Always use for any modification,
  review, refactor, or design planning that touches MoE modules, including
  tensorrt_llm/_torch/modules/fused_moe, ConfigurableMoE, MoE backends,
  MoEScheduler/moe_scheduler.py, forward execution/chunking, communication
  strategies, EPLB, quantization/weight
  handling, routing, factories, MoE docs, or MoE tests. Also use when the user
  asks whether a MoE design follows the current architecture or whether a MoE
  refactor is reasonable.
license: Apache-2.0
metadata:
  author: NVIDIA Corporation
---

# TensorRT-LLM MoE Code Quality

Use this skill to keep MoE changes aligned with the current TensorRT-LLM MoE
architecture. Favor module roles, API boundaries, and testability over local
style cleanup.

## Required Context

Before proposing or editing MoE code, read:

1. `CODING_GUIDELINES.md`
2. `tensorrt_llm/_torch/modules/fused_moe/MOE_DEVELOPER_GUIDE.md`
3. The target files being changed
4. The relevant tests under `tests/unittest/_torch/modules/moe/`

Also inspect these files when the area is relevant:

- Forward execution/chunking: inspect `moe_scheduler.py`, `configurable_moe.py`,
  `interface.py`, backend `run_moe`/`quantize_input` paths, and communication code.
- MegaMoE/fused communication: inspect `moe_scheduler.py`, `mega_moe/`,
  `configurable_moe.py`, `quantization.py`, and communication code.
- Communication: `tensorrt_llm/_torch/modules/fused_moe/communication/base.py`
  and `communication_factory.py`.
- Quantization and weights: `tensorrt_llm/_torch/modules/fused_moe/quantization.py`.
- EPLB/load balancing: `interface.py`, `moe_load_balancer.py`, `quantization.py`,
  `moe_scheduler.py`, current forward-execution/chunking code, and
  `test_moe_module.py`.
- Test matrix/helpers: `tests/unittest/_torch/modules/moe/moe_test_utils.py` and
  `quantize_utils.py` when adding backend, quantization, skip, or parameter
  coverage.


For module-specific work, read `references/moe-canonical-code-examples.md`
after the guide and load only the relevant section. Each design gate or review
should cite at least one concrete code example with file:line evidence.

## Working With MOE_DEVELOPER_GUIDE.md

Treat `MOE_DEVELOPER_GUIDE.md` as the in-repo source of truth for MoE
architecture. Treat this skill as the agent workflow
layer that tells Codex how to apply that source of truth while designing,
editing, or reviewing code.

Use the guide this way:

- Start from the guide sections that match the requested change: Architecture,
  File Map, Backend Capability Matrix, execution-flow/EPLB constraints,
  Canonical Examples, and Anti-Patterns.
- Use guide content to fill the design gate: owner boundary, main API, reference
  pattern, and test plan.
- Do not duplicate fast-changing matrices or backend support tables in this
  skill; prefer the guide as the current reference.
- If a code change adds a backend, quantization method, communication strategy,
  fused-communication behavior, EPLB behavior, or test convention, check whether
  the guide also needs an update.
- If guide and code disagree, inspect code and tests, mention the mismatch, and
  either update the guide as part of the change or report it as follow-up.

Guide-update checklist:

- File map changed: update `File Map`.
- Backend or quant support changed: update `Backend Capability Matrix`.
- New backend/communication/forward-execution pattern: update `Canonical Examples`.
- New forbidden pattern or ownership rule: update `Anti-Patterns`.
- Test convention changed: update `Tests`.

## Core Principle

Preserve these owner boundaries:

- `ConfigurableMoE` is the assembler/orchestrator. It wires backend,
  communication, EPLB, weight lifecycle delegation, and shared wrapper
  bookkeeping.
- Backends declare capabilities, run MoE computation, and own the MoE module's
  weigh