3 pointsby g0234 hours ago1 comment
  • santander_cl4 hours ago
    Starred immediately.

    This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.