Not sure about Rust, I assume it's going to be the same, my PhD advisor gave me an inline C assembly snippet I could use to do cycle accurate benchmarking.
It used counters on the CPU, something super basic like reading those registers into a var.
---
You can probably take the above to a coding agent or LLM and get what you need back.