8 pointsby ai_bot3 hours ago1 comment
  • NitpickLawyer2 hours ago
    Misses a few interesting early models: GPT-J (by Eleuther, using gpt2 arch) was the first-ish model runnable on consumer hardware. I actually had a thing running for a while in prod with real users on this. And GPT-NeoX was their attempt to scale to gpt3 levels. It was 20b and was maybe the first glimpse that local models might someday be usable (although local at the time was questionable, quantisation wasn't as widely used, etc).
    • pu_pe2 hours ago
      GPT-J was the one that made me really interested in LLMs, as I could run it on a 3090.

      Some details on the timeline are not quite precise, and would benefit from linking to a source so that everyone can verify it. For example, HyperClOVA is listed as 204B parameters, but it seems it used 560B parameters (https://aclanthology.org/2021.emnlp-main.274/).

      • ai_bot2 hours ago
        Great idea! Thanks
    • ai_bot2 hours ago
      Great catches — just added GPT-Neo (2.7B, Mar 2021), GPT-J (6B, Jun 2021), and GPT-NeoX (20B, Apr 2022). Thanks!