For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained.
Using undocumented low level Apple's APIs (SMC and IOReport), we were able to reverse engineer an energy model that explains almost all of of the energy flow in an Apple's SoC with less than 2% error on the workload I studied.
The result is a simple two-term energy roofline model:
P_GPU ≈ a * bytes + b * FLOPs
with:
~5 pJ/byte for SRAM movement
~2.7 pJ/FLOP for compute.
Not only that, but we were able to attribute energy flow to each of the principal functional blocks on the M4 Max SoC, like CPU, GPU compute, GPU SRAM, chip fabric components and DRAM.
For this one example:
179W System DC Power measured via SMC. Of which:
133W GPU (my inference) 18W DRAM 28W SoC Fabric (sum of 3 fabric related components) <1W CPU Think of these values as how much system DC power rise was due to GPU activity, DRAM activity, etc. They are not the exact electrical power, as the VRM losses are not included so the functional blocks slightly overestimate the actual electrical power flowing in.
Now, if you would want to compare against a discrete GPU whose DC power is measured at the board interface, one would definitely want to include DRAM and possible the Fabric power too (if the CPU power is minimal as in this example).
The video walks through the experiments and validation in detail. Happy to answer questions about the measurement setup or the kernels used.