2 pointsby amthorn5 hours ago1 comment
  • amthorn5 hours ago
    This demo uses standard transformer weights with a very small attention/KV component, but most temporal memory is handled by a stateful operator rather than a growing context window.

    Outputs are similar to a transformer, while running super fast on CPU with much lower memory use.