Per-Step Energy
Using matcha wrap
Non-step lines like config, warmup, and validation pass through unchanged.
At the end, the same summary line as matcha run is printed:
How step detection works
Matcha scans each line of stdout for patterns that indicate a training step. The following patterns are recognized:
Lines containing warmup are always skipped.
When Matcha sees step N, it ends the measurement for the previous step and starts a new one. Energy is computed from the NVML readings collected between the two step markers.
Step gaps
If your training script logs every 200 steps, for example step 10 and then step 200, Matcha measures the total energy across that window and divides by the number of steps:
The energy field always shows a per-step average, whether the gap is 1 step or 500.
Reading per-step output
Each step line gets three fields appended:
| Field | Meaning |
|---|---|
energy | Energy per step in joules, averaged over the step gap |
avg_power | Mean GPU power during this window |
peak_power | Peak GPU power during this window |
When to use wrap vs run
Use matcha run for benchmark and production runs. It has zero overhead.
Use matcha wrap for diagnosis, finding energy spikes, identifying inefficient phases, and comparing step-level behavior between configurations. It intercepts stdout so there is minor I/O overhead, though in our benchmarks this was within run-to-run variance.