2 pointsby rcdexta4 hours ago1 comment
  • rcdexta4 hours ago
    I wrote a post that explains prompt caching by walking through the actual transformer pipeline step by step: tokenization, embeddings, positional encoding, attention (Q/K/V), and KV caching.

    Most explanations I found online stop at "identical prefixes get cheaper." That never satisfied me. I wanted to understand what exactly is being reused, why a single whitespace change breaks the cache, and why temperature doesn't affect cache hits. So I built it up from scratch.

    Tried explaining every concept with illustrations to make it easier to understand. The other thing I wanted to try: there's a "Margin Notes" button on every paragraph. Select any text you find dense, pick an action (simplify, analogy, step-by-step, quiz), and an LLM rewrites that section for you as you see fit. The idea is to customize the reading experience for everyone, some people want the math, some want an analogy, some want to be quizzed.

    Would love feedback, it would help me figure out whether to keep adding content in this format. Thanks!