5 pointsby xaskasdf2 days ago3 comments
  • SilentEditor2 days ago
    Love this project. The CD streaming trick is such a smart constraint hack, and honestly the best part is you trained the model for the hardware instead of forcing a desktop recipe onto PS2.

    Curious about 2 things if you can share:

    whats your per-token latency on real hardware how much quality loss came from PSNT quantization vs fp16 baseline Either way this is peak hacker energy, shipping on actual hardware makes it 10x cooler.

    • xaskasdf2 days ago
      It didn't had any quality loss, since the PSNT as quantization it's mainly to convert the model over the console constraints (you can convert any model you want, even when i trained a model for this hw); it's q8 quantization, so quality loss is negligible for these sizes. For the speed, I will fix the Tok/sec count since now drops 0 always for showing measures

      PS: Thank you! And forgot to mention PSNT also supports bitnet models, they work like crap tho

      • SilentEditor15 hours ago
        Thats super helpful, thanks for the details. Makes sense now that PSNT is more of a transport/runtime format for the PS2 constraints than a quality hack.

        Very cool that it supports bitnet too even if results are rough right now, feels like theres a lot of room to tune there over time. when you do fix tok/sec, are you planning to post per-stage timings too (tokenizer, weight stream, matmul, samppling)? would be awesome to see where the biggest bottleneck is on real hw

  • SachitRafa2 days ago
    The CD-ROM streaming approach is the real insight here, keeping only activations and KV cache in RAM and streaming weights one matrix at a time sidesteps the 32MB constraint entirely. It's essentially the same trick modern edge inference does with flash storage, just on hardware from 2000. Curious about the latency profile, with CD-ROM read speeds around 1.6 MB/s on PS2, the 77MB SmolLM2 model being too slow makes sense, but how does the 10MB brandon-tiny feel in practice? Are you getting tokens per minute or more like tokens per several seconds? Also interested in the custom PSNT format decision, was the main motivation the PS2's MIPS alignment constraints, or was there something about the existing GGUF/llama.c formats that made them impractical to parse on the Emotion Engine?
  • mememememememo2 days ago
    How many tok/hr?