It’s interesting to observe in oneself how repetition can result in internalizing new concepts. It is less about rote memorization but more about becoming aware of nuance and letting our minds “see” things from different angles, integrating it with existing world models either through augmentation, replacement, or adjustment. Similar for practicing activities that require some form of motor ability.
Some concepts are internalized less explicitly, like when we “learn” through role-modeling behaviors or feedback loops through interaction with people, objects, and ideas (like how to fit into a society).
Sometimes one gets stuck, and a different explanation, previously completely incomprehensible, gets one "unstuck" enough to make all the other explanations start making more sense.
Sometimes you realize midway through that there's some other concept you don't fully understand (e.g. you can't fully understand why an attention layer looks like it does without understanding matmul or tensor ops). Getting some explanations of these other concepts can also get one unstuck.
I imagine this as every explanation containing little "nuggets" of knowledge at various "skill levels", but you can only grasp at those that are slightly beyond yours at the moment. Too high and nothing makes sense, too low and it's obvious.
As you re-read the explanation, you understand more, and so more "nuggets" are available to you. Sometimes the explanation is too hard, there's nothing to grasp onto and you're stuck, this is where other explanations, or explanations of other concepts, can push you over the edge.
multiple encodings are stronger since they end up fusing into not just more robust and general representations, but also more well connected.
i forgot which neuroscience textbook it was that I read all this from from, but Dr. Barbara Oakley talks about this in her course too so anyone can look through there for more details and sources.
> desirable difficulty
Well regarded in anki, music practice, and most problem solving books (Polya's "How To Solve It"). I think some neuroscience texts also try to formalize it but I don't know enough there to point to a more concrete abstraction of the phenomenon.
> As you re-read the explanation
I suggest something even better. Try to do the explanation yourself from free recall, and when you fail, then re-read and note exactly where and why you failed. This is a mid-level efficiency technique (I forgot the source) compared to free call, active testing, and spaced repetition, but definitely better than plain re-reading.
I've found myself having a lot easier time to learn difficult new concepts by (somewhat counterintuitively) stop trying to learn it for a week or two, then come back to it, and suddenly it's a lot easier to grasp, even if I haven't consciously thought about it since I left it.
I really relate to this. In fact, a lot of times, when working on a difficult problem, I’ll intentionally go take a nap when I feel stuck, and more often than not, I’ll wake up feeling like the problem is much easier than I thought before my nap. I can’t explain why it works so well—it just does!
With something like learning a language, problem solving skills, a code base, new maths, or wtv, it is less apparent because the mind can delude itself, and it is much harder to delude yourself with actual physical movements. You either do it, or you don't...
It's so strange, you read the material and sometimes you either struggle with understanding or even don't see how it's relevant and not just a waste of time.
Then when you review it much later, it's like opening a new room of knowledge, it just makes sense and it makes me so happy in a strange way.
I've had this happen often with music theory. Music theory it pretty dull on it's own and small gaps in knowledge excludes you from understanding a lot more.
PS: electrical engineering has one of the most reader-hostile presentations, so all authors should be tried for crimes against humanity /s
It's really hard to spell out new concepts in a way that doesn't require the reader to make several passes to resolve all the nuance... but it's not necessarily a matter of how we learn, just how we write.
(Read the glossary first).
Lots of stuff is like this. You learn what you do. Sitting in a lecture theatre hearing about pointers, or photography, or DSP, isn’t really learning those things. Some people are definitely better at using their imagination to actually learn things during a lecture. But it’s a long shot at the best of times I think. For everyone.
Found the SO post: https://stackoverflow.com/questions/35379191/cannot-cast-var...
Sorry if this explanation is convoluted. There is a lot going on in the brain!
from fancy_module import magic_functions
I'm semi-serious here, of course. To me, for something to be called 'from scratch', requisite knowledge should be built ground up. To wit, I'd want to write the tokenizer myself but don't want to derive laws of quantum physics that makes the computation happen.But at the end of the day, it depends on where you want to spend your time. "Build an LLM from scratch" is over 300 pages -- and they are very dense pages. My blog post covers fewer than 10 of them (though TBF they are the hardest pages). Adding on tokenizers in depth from scratch would add on 100 or so more. Adding on efficient-enough matrix multiplication to do anything would add on a few hundred more, and doing it in CUDA would probably be a couple of thousand. Now add on automated differentiation to work out the gradients for training -- a few thousand more? Optimizers for the training -- even more than that, perhaps.
You have to draw the line somewhere, as otherwise (as you suggest) the "from scratch" book has to start "go out and get some really clean sand" so that you can start fabbing your own chips. I think that tiktoken and PyTorch are a solid choice for that line, as it means that the book is manageable in size and gives you enough of an overview of the underlying stuff to be able to work out what you want to dig into next.
What I'm doing is building up a list of things to dig into in depth once I've finished the book. Kind of like a treat to encourage me to push forward when I'm working through a bit that's tough to understand.
1) I am kidding. 2) At what point does it become self replicating? 3) skynet. 4) kidding - not kidding.