- make a visualization of the article above and it would be the biggest aha moment in tech
I hope make more of these, I'd love to see a transformer presented more clearly.
If you want to understand neural networks, keep going.
> You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output
Brute force just means guessing all possible combinations. A dataset containing most human knowledge is about as brute force as you can get.I'm fairly sure that Alpha Zero data is generated by Alpha Zero. But it's not an LLM.
The sampling stage of Evolution Strategies at least bears a resemblance but even that is still a strategic gradient descent algorithm. Meanwhile backprop is about as far from brute force as you can get.
Don't think it's moire effect but yeah looking at the pattern
That's cool, rendering shades in the old days
Man those graphics are so good damn
The ‘secret sauce’ in a deep network is the hidden layer with a non-linear activation function. Without that you could simplify all the layers to a linear model.
It completely misses the mark on what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins
It doesn't match the pictures in your head, but it nevertheless does present a mental representation the author (and presumably some readers) find useful.
Instead of nitpicking, perhaps pointing to a better visualization (like maybe this video: https://www.youtube.com/watch?v=ChfEO8l-fas) could help others learn. Otherwise it's just frustrating to read comments like this.