Skip to Content
Get StartedOverview

Overview

Matcha is a CLI tool that measures GPU energy consumption during AI training runs.

You prefix your training command with matcha run. Your training executes at full speed. When it finishes, Matcha reports how much energy your GPUs consumed.

Who it’s for

Engineers and researchers running training or fine-tuning jobs on NVIDIA GPUs, on cloud instances like RunPod, Lambda, and AWS, on-premise clusters, or local workstations.

What it measures

  • Total energy consumed (joules, watt-hours)
  • Average and peak GPU power draw (watts)
  • Duration of the full run
  • Per-step energy breakdown (with matcha wrap)

How it works

Matcha spawns your training command as a child process. In a separate background thread, it polls GPU power via NVML at 100ms intervals. Your training process runs natively. Matcha never intercepts stdout, modifies your code, or injects anything into your training loop.

matcha runyour commandtraining processruns nativelyNVML threadpolls every 100msstdoutdirect to terminalpower readings(timestamp, watts)summaryenergy, powerTraining and NVML polling run in parallel. Zero interaction. Zero overhead.

When the training process exits, Matcha computes total energy using trapezoidal integration of the power readings and prints a single summary line.

What you need

  • An NVIDIA GPU with drivers installed (nvidia-smi must work)
  • Python 3.9+
  • pip or uv