I’m wrapping up a role where I spent a significant amount of time writing Triton kernels. It’s a fantastic tool, but the learning curve has some sharp edges. I wanted to share a few practical "notes from the field" for anyone moving beyond the very opaque docs.
Here is the Reddit thread:
https://www.reddit.com/r/MachineLearning/comments/otdpkx/n_i...