1 pointby malith2 hours ago1 comment
  • malith2 hours ago
    We ran inference benchmarks for arcee-ai/trinity-mini on Nvidia H200 using our DeployPad inference stack and published the full results.

    Key results:

    Mean tokens per second: ~114.5 Mean time to first token: 0.74 s

    Under batch load, P99 tokens per second reached ~134.8.

    The full benchmark report, raw statistics, and methodology are available here: https://github.com/geoddllc/large-llm-inference-benchmarks/b...

    Support for larger models (400B class) is planned for next week. If you want to try it yourself, you can deploy via the console https://console.geodd.io/