4 pointsby danebalia15 hours ago1 comment
  • bigyabai15 hours ago
    > The 35B Trick (Your SSD Is the New GPU Memory)

    Wave "bye bye" to your write cycles.

    • RobMurray13 hours ago
      why? it's mostly reads. the weights are static.
      • bigyabai12 hours ago
        llama-cpp's process is, but macOS itself will swap hard when 10-14gb of memory is paged for LLM inference. Dense models especially would thrash zram.