Energy observability for AI workloads
Matcha helps you understand how AI workloads actually use compute.
It measures energy, runtime, and efficiency directly from your training runs, exposing signals that are usually hidden behind infrastructure and system logs. Instead of relying only on loss or throughput, Matcha lets you observe how compute is consumed as your workload executes.
You run your training script as usual. Matcha runs alongside it and records energy usage, runtime, and step-level efficiency, turning each run into a measurable experiment.
Matcha workflow: run (start your training) → measure (capture GPU energy) → analyze (break down per step) → compare (find what to optimize).
pip install usematcha
matcha run torchrun --nproc_per_node=8 train.pySample output:
step:1709/20000 val_bpb:1.3095 train_time:600097ms
final_int8_zlib_roundtrip val_bpb:1.31072868
matcha_energy total:364722J (101Wh) avg_power:489W peak:700W