Multi-GPU
Auto-detection
By default, Matcha detects all NVIDIA GPUs on the machine and sums power readings across all of them. No flags needed.
# all GPUs (default — auto-detects all 8 GPUs, sums power across them)
matcha run torchrun --standalone --nproc_per_node=8 train.py
# specific GPUs (only polls GPUs 0 through 3)
matcha run --gpus 0,1,2,3 torchrun --nproc_per_node=4 train.pyExample output on 8xH100:
matcha_energy gpus:8x NVIDIA H100 80GB HBM3 total:2920000J (811Wh)How it works
Every 100ms, Matcha reads nvmlDeviceGetPowerUsage for each monitored GPU and sums the values into a single power reading. Trapezoidal integration then computes total energy from these summed readings.
avg_power and peak_power in the output reflect the aggregate across all monitored GPUs. For example, 8 GPUs each drawing 500W would show avg_power:4000W.