1 pointby cjparadise5 hours ago2 comments
  • cjparadise5 hours ago
    Don't Quantize Use CONVERA Instead of focusing only on faster hardware or larger models, it focuses on:

    > Reusing work that has already been done.

    In its current public form, CONVERA:

    - runs LLMs locally (HuggingFace)

    - executes prompts through a controlled runtime

    - caches repeated prompt results

    - detects reuse opportunities

    - returns measurable latency improvements on repeat runs

  • cjparadise2 hours ago
    [dead]