Overview

Matcha is a CLI tool that measures GPU energy consumption during AI training runs.

You prefix your training command with matcha run. Your training executes at full speed. When it finishes, Matcha reports how much energy your GPUs consumed.

Who it’s for

Engineers and researchers running training or fine-tuning jobs on NVIDIA GPUs, on cloud instances like RunPod, Lambda, and AWS, on-premise clusters, or local workstations.

What it measures

Total energy consumed (joules, watt-hours)
Average and peak GPU power draw (watts)
Duration of the full run
Per-step energy breakdown (with matcha wrap)

How it works

Matcha spawns your training command as a child process. In a separate background thread, it polls GPU power via NVML at 100ms intervals. Your training process runs natively. Matcha never intercepts stdout, modifies your code, or injects anything into your training loop.

When the training process exits, Matcha computes total energy using trapezoidal integration of the power readings and prints a single summary line.

What you need

An NVIDIA GPU with drivers installed (nvidia-smi must work)
Python 3.9+
pip or uv