120 pointsby b448 hours ago4 comments
  • krackers3 hours ago
    There used to be this page that showed the activations/residual stream from gpt-2 visualized as a black-white image. I remember it being neat how you could slowly see order forming from seemingly random activations as it progressed through the layers.

    Can't find it now though (maybe the link rotted?), anyone happen to know what that was?

  • kfsone6 hours ago
    Minor nit: In familiarity, you gloss over the fact that it's character rather than token based which might be worth a shout out:

    "Microgpt's larger cousins using building blocks called tokens representing one or more letters. That's hard to reason about, but essential for building sentences and conversations.

    "So we'll just deal with spelling names using the English alphabet. That gives us 26 tokens, one for each letter."

    • mips_avatar4 hours ago
      Using ascii characters is a simple form of tokenization with less compression
    • b445 hours ago
      hm. the way i see things, characters are the natural/obvious building blocks and tokenization is just an improvement on that. i do mention chatgpt et al. use tokens in the last q&a dropdown, though
  • msla5 hours ago
    About how many training steps are required to get good output?
    • WatchDog2 hours ago
      I trained 12,000 steps at 4 layers, and the output is kind of name-like, but it didn't reproduce any actual name from it's training data after 20 or so generations.
    • b445 hours ago
      not many. diminishing returns start before 1000 and past that you should just add a second/third layer
  • darepublic3 hours ago
    thank you for this