Public Runtime for Convera for LLM's(github.com)

1 pointby cjparadise5 hours ago2 comments

cjparadise5 hours ago
Don't Quantize Use CONVERA Instead of focusing only on faster hardware or larger models, it focuses on:
> Reusing work that has already been done.
In its current public form, CONVERA:
- runs LLMs locally (HuggingFace)
- executes prompts through a controlled runtime
- caches repeated prompt results
- detects reuse opportunities
- returns measurable latency improvements on repeat runs
- cjparadise2 hours ago
  [dead]
cjparadise2 hours ago
[dead]