Skip to Content
Get StartedQuick Start

Quick Start

Run your training with Matcha

Prefix your training command with matcha run:

MATCHA RUN$matcha run torchrun -​-standalone -​-nproc_per_node=1 train_gpt.pywarmup_step:20/20step:1/20000 train_loss:6.9357 train_time:409msstep:1312/20000 val_loss:2.2944 train_time:600036msstopping_early: wallclock_cap step:1312/20000matcha_energy gpus:NVIDIA H100 80GB HBM3 total:370342J(102.87Wh) duration:687.8s avg_power:538W peak_power:701W

Reading the output

FieldMeaninggpusGPU model and count (auto-detected)totalTotal energy consumed - joules and watt-hoursdurationWall-clock time from start to finishavg_powerMean GPU power draw across the runpeak_powerHighest instantaneous power readingsamplesNumber of NVML readings (at 100ms intervals)

What counts as duration

Duration covers everything from when Matcha starts to when your training process exits. This includes model compilation, data loading, warmup steps, training, validation, checkpointing, and serialization. It is wall-clock time, not just training time.

Zero overhead

Matcha does not pipe or intercept your training output. Your process writes directly to the terminal. The only work Matcha does is polling NVML in a background thread, which has no measurable impact on training performance.