Energy observability for AI workloads

Matcha helps you understand how AI workloads actually use compute.

It measures energy, runtime, and efficiency directly from your training runs, exposing signals that are usually hidden behind infrastructure and system logs. Instead of relying only on loss or throughput, Matcha lets you observe how compute is consumed as your workload executes.

You run your training script as usual. Matcha runs alongside it and records energy usage, runtime, and step-level efficiency, turning each run into a measurable experiment.

Matcha workflow: run (start your training) → measure (capture GPU energy) → analyze (break down per step) → compare (find what to optimize).


pip install usematcha
matcha run torchrun --nproc_per_node=8 train.py

Sample output:


step:1709/20000 val_bpb:1.3095 train_time:600097ms
final_int8_zlib_roundtrip val_bpb:1.31072868
matcha_energy total:364722J (101Wh) avg_power:489W peak:700W

FAQs

What is Matcha and who is it for?

Does Matcha change my training?

What does Matcha measure today?

Does Matcha run locally?

Is Matcha only for training?

Why not just use runtime or GPU utilization?

How is Matcha different from tools like nvidia-smi or NVML?