Arcee Trinity Mini Inference Benchmarks on Nvidia H200(www.geodd.io)

1 pointby malith2 hours ago1 comment

malith2 hours ago
We ran inference benchmarks for arcee-ai/trinity-mini on Nvidia H200 using our DeployPad inference stack and published the full results.
Key results:
Mean tokens per second: ~114.5 Mean time to first token: 0.74 s
Under batch load, P99 tokens per second reached ~134.8.
The full benchmark report, raw statistics, and methodology are available here: https://github.com/geoddllc/large-llm-inference-benchmarks/b...
Support for larger models (400B class) is planned for next week. If you want to try it yourself, you can deploy via the console https://console.geodd.io/