481 pointsby ingve4 days ago11 comments
  • ofou4 days ago
    I believe that most of the papers presented here focus on acquiring knowledge rather than deep understanding. If you’re completely unfamiliar with the subject, I recommend starting with textbooks rather than papers. The latest Bishop’s "Deep Learning: Foundations and Concepts (2024)" [1] is an excellent resource that covers the "basics" of deep learning and is quite updated. Another good option is Chip Huyen’s "AI Engineering (2024)" [2]. Another excellent choice will be "Dive into Deep Learning" [3], Understanding Deep Learning [4], or just read anything from fast.ai and watch Karpathy's lectures on YouTube.

    [1]: https://www.bishopbook.com [2]: https://www.oreilly.com/library/view/ai-engineering/97810981... [3]: https://d2l.ai [4]: https://udlbook.github.io/udlbook/

    • Tsarp4 days ago
      AI engineering is more applied than research and in that regard this list is great.

      swyx and teams podcast, newsletter and discord has been the highest signal to noise ratio for keeping up and learning.

      • swyx3 days ago
        thank you! very kind. much to improve.
        • ofou2 days ago
          By the way, I think this is a fantastic reading list for creating AI products, and especially for staying updated on the latest in the AI space. However, it feels a bit scattered and might be hard for beginners to follow, IMO.

          I read your book, The Coding Career Handbook, we need something similar for AI Engineering! I really enjoyed it. Thank you for creating and sharing such high-quality multimodal content :)

          • swyxa day ago
            wow thanks for reading it! yeah i think i should just write the new book for ai engineering i guess but Chip Huyen already took the oreilly slot lol
    • rahimnathwani4 days ago
      Of the resources you mention, #2 is probably the best starting point for someone who wants to start building software soon. Karpathy's videos fast.ai's courses may also fit that purpose.

      But the other books (#1, #3, #4) seem like they're intended for those who want to understand all the math. Many people don't want (or need) a full understanding of how all this works. They can provide significant value to their employers with some knowledge of how machine learning works (e.g. the basics of CNNs and RNNs), and some intuitions/vibes about SOTA LLMs, even if they don't understand transformers or other modern innovations.

    • ewuhic4 days ago
      Is there a textbook like [1] or [4], which also incorporates PyTorch into learning?
      • ofou4 days ago
        Dive into Deep Learning is implemented using various libraries such as PyTorch, NumPy/MXNet, JAX, and TensorFlow.

        Here’s an example: https://d2l.ai/chapter_natural-language-processing-pretraini...

        • ewuhic4 days ago
          This resource is not as profound compared to the other 2, unfortunately.
          • drdude4 days ago
            I thought the same too...

            I read Deep Learning by Goodfellow and Deep Learning with TensorFlow 2 and Keras for practical stuff. I am still thinking if I should do the D2L for additional practice in my free time, though.

  • kamikazeturtles4 days ago
    I don't know what an "AI Engineer" is, but, is reading research papers actually necessary if the half life of the relevancy of many of these papers is only a few months until the next breakthrough happens?

    I have a feeling, unless you're dabbling at the cutting edge of AI, there's no point in reading research papers. Just get a feel for how these LLMs respond then build a pretty and user friendly app on top of them. Knowing the difference between "multi head attention" and "single head attention" isn't very useful if you're just using OpenAI or Groq's API.

    Am I missing something here? I'd love to know where I'm wrong

    • lolinder4 days ago
      > I don't know what an "AI Engineer" is, but, is reading research papers actually necessary

      Let's put it this way: if even half the people who call themselves "AI Engineers" would read the research in the field, we'd have a lot less hype and a lot more success in finding the actual useful applications of this technology. As is, most "AI Engineers" assume the same thing you do and consider "AI Engineering" to be "I know how to plug this black box into this other black box and return the result as JSON! Pay me!". Meanwhile most AI startups are doomed from the start because what they set out to do is known to be a bad fit.

      • wnmurphy4 days ago
        > I know how to plug this black box into this other black box and return the result as JSON!

        To be fair, most of software engineering is this.

        • bumby4 days ago
          Tbf- most of [any] engineering is like this.
          • torginus4 days ago
            But most 'engineering' is not engineering.
            • bumby4 days ago
              Care to explain your perspective? "Engineering" can be a bit of a fuzzy definition. To some it means "building something". To others, it requires the application (and understanding!) of mathematic and scientific principles to build something.

              I would disagree that most engineering is not involved in building something...whether most engineers understand the math/science behind it is debatable.

        • otteromkram4 days ago
          Okay, now take a slightly imbalanced stance: What is most software engineering?
          • dietr1ch4 days ago
            I don't know, but if I say it's about working with things you don't fully understand people seem to trust me.
      • sanderjd4 days ago
        I kind of see this the opposite way...

        Or rather, I guess I feel like it's a sign of the immaturity of the space that it is still kind of unclear (at least it is to me) how to build useful things without reading all the research papers.

        To me, it seems like there is an uncanny valley between "people who are up on all the papers in this reading list" and "people who are just getting a feel for how these LLMs respond and slapping a UI on top".

        Maybe it kind of reminds me of the CGI period of the web. The "research papers" side is maybe akin to all the people working on networking protocols and servers necessary to run the web, and the "slap a UI over the llm APIs" is akin to those of us slinging html and perl scripts.

        You could make ok stuff that way, without needing to understand anything about TCP. But it still took a little while for a more professionalized layer to mature between those two extremes.

        I feel like maybe generative AI is in the early days of that middle layer developing?

      • hintymad4 days ago
        Even before the amazing achievement of LLM, there were millions of "ML engineers" on LinkedIn, per some stats about LinkedIn jobs. I'll bet a single digit percent of them could even derive the math of linear regression or every implemented a single ML algorithm from scratch. Not that it is wrong, mind you, but it means it's unlikely for half the "AI engineers" to read research papers.
        • HPsquared4 days ago
          A lot of people did a MOOC for the CV points.
      • crystal_revenge4 days ago
        > if even half the people who call themselves "AI Engineers" would read the research in the field, we'd have a lot less hype and a lot more success in finding the actual useful applications of this technology

        As someone working in the area for a few years now (both on the product and research side), I strongly disagree. A shocking number of papers in this area are just flat out wrong. Universities/Research teams are churning out garbage with catchy titles at such a tremendous rate that reading all of these papers will likely leave one understanding less than if they read none.

        The papers in this list are decent, but I wouldn't be shocked if the conclusions of a good number of them were ultimately either radically altered or outright inverted as we learn more about what's actually happening in LLMs.

        The best AI engineers I've worked with are just out there experimenting and building stuff. A good AI engineer definitely has to be working closely to the model, if you're just calling an API you're not really an "AI Engineer" in my book. While most good AI engineers have likely accidentally read most of these paper through the course of their day job, they tend to be reading them with skepticism.

        A great demonstration of this is the Stable Diffusion community. Hardly any of the innovation in that space is even properly documented (this, of course, is not ideal), much less used for flag planting on arXiv. But nonetheless the generative image AI scene is exploding in creativity, novel applications, and shocking improvements all with far less engineering/research resources devoted to the task than their peers in the LLM world.

        • serjester3 days ago
          Couldn't agree more with you. At the end of the day the people building the most successful products, are too be busy to be formalizing their experiments into research papers. While I have respect for academic researchers, I think their perspective is fundamentally very limited when it comes to AI engineering. The space is just too frothy.
      • gazchop4 days ago
        Careful now. Don't want to upset the future generation of unemployed prompt engineers.
      • HPsquared4 days ago
        Investors can't tell the difference between hype and (future) results.
    • swyx4 days ago
      hi! author here.

      > Just get a feel for how these LLMs respond then build a pretty and user friendly app on top of them.

      as you know, "just" is a very loaded word in software engineering. The entire thesis of AI Eng is that this attidude of "just slap a UI on an LLM bro whats so hard" is a rapidly deepening field, with its own stack and specialization (which, yes, some if not much of which is unnecessary, vc funded hypey complexity merchantism, but some of which is also valid), and if you do not take it seriously, others will, and do so running rings around those who have decided to not even try to push this frontier, passively waiting for model progress to solve everything.

      i've seen this play out before in underappreciated subfields of engineering that became their own thing, with their own language, standard stack, influencers, debates, controversies, IPOs, whole 9 yards.... frontend eng, mobile eng, SRE, data eng, you name it. you just have to see the level and quality of work that these people are doing that is sufficiently distinct from MLE and product/fullstack webdev to appreciate that it probably deserves its own field of study, and while it will NEVER be as prestigious as AI research, there will be a ton more people employed in these roles than there can be in research and thats a perfectly fine occupation too.

      I'm even helping instruct a course about it this week as it happens if you want to see what a practical syllabus for it looks like https://maven.com/noah-hein/ai-engineering-intro

      • 4 days ago
        undefined
    • thecupisblue4 days ago
      There is multiple levels of AI engineers:

      1. The actual deep ML researchers that work on models 2. The "AI engineer" who creates products based on LLM's 3. The "AI researchers" who basically just stack LLM's together and call it something like Meta-Cognitive Chain-of-Thought Advanced Reasoning Inteligence or whatever it is.

      • jhanschoo4 days ago
        > 1. The actual deep ML researchers that work on models

        > 3. The "AI researchers" who basically just stack LLM's together and call it something like Meta-Cognitive Chain-of-Thought Advanced Reasoning Inteligence or whatever it is.

        I actually think that working purely within the traditional neural nets model is starting to hit against its limits and the most fruitful directions for research are systems that incorporate and modify LLMs on-line, among other systems, despite your unserious characterization of this class of research.

      • otteromkram4 days ago
        Why are there two research branches?

        Seems like there's one AI engineer, which is b. The other two are researchers, one doesn't even focus on AI since ML covers a broader swath of disciplines.

    • eKIK4 days ago
      I consider myself to be an (occasional) user of AI services like the ones OpenAI and others provide. I've learned how to consume the services reasonably effectively, and make good use of them, but that's about it. I am not an AI engineer.

      Similarly I know how to call cryptography libraries to get my passwords hashed using a suitable cipher before storing them. I don't understand the deep math behind why a certain cipher is secure, but that's fine. I can still make good use of cryptographic functions. I'm not a cryptography engineer either :).

      My take on it is that if you should call yourself any kind of "XYZ Engineer", you should be able to understand the inner workings of XYZ.

      This reading list is most likely (mostly) for those who want to get a really deep understanding and eventuellt work on contributing to the "foundational systems" (for a lack of a better word) one day.

      Hope that helps.

      • swyx4 days ago
        i mostly agree w you, but theres a wide spectrum of “understand the inner workings” given rising complexity.

        consider:

        - does a React/frotnend engineer need to know everything about react internals to be good at their job?

        - does a commercial airline pilot need to know every single subsystem in order to do their job?

        - do you, a sophisticated hackernewsian, really know how your computer works?

        more knowledge is always (usually) better but as a thing diffuses into practice and industry theres a natural stopping point that “technician” level people reach that is still valuable to society bc of relative talent supply and demand.

        • torginus4 days ago
          > - does a React/frotnend engineer need to know everything about react internals to be good at their job?

          Yes? Well, not everything (which I define as being able to implement React from scratch). But if you want to do good work, and be able to fix those pesky bugs which result from the arcane behavior of the framework itself, then you better know your stuff.

          Besides, in practice very few people understand the most basic stuff about React. Just recently I had to explain to a veteran frontend dev what list virtualization was and why it's not a good idea to display a list of 100k items directly.

        • mettamage4 days ago
          I personally found that people need to understand the layer of the stack they're working on (e.g. a frontend dev should understand React). Going a layer higher or lower (or two) seems only to be handy for troubleshooting, debugging or you're simply having an expanded role.
        • vunderba3 days ago
          > does a commercial airline pilot need to know every single subsystem in order to do their job?

          Not a great comparison. First off, nobody is suggesting that a self-purported "AI Engineer" has to understand EVERY SINGLE SUBSYSTEM, but they should still have a strong command of the internal workings of the modern foundational material (transformers, neural networks, latent space, etc.) to style themselves as such.

          The better question is "does an aviation mechanic need to understand the internal systems of an airplane?" and the answer is a resounding yes.

        • 4 days ago
          undefined
        • bronco210164 days ago
          >- does a commercial airline pilot need to know every single subsystem in order to do their job?

          Haha explain this one to the APDs (aircrew program designee, the people signing off training at airlines) please.

          Every airline pilot has their horror stories of being asked how many holes are in the alternate static port of some aircraft they've flown. Or through bolts on the wheel hub, or how many plys of glass on the side cockpit window, or the formula for calculating hydroplane speed, or the formula for calculating straight line distance to the horizon from altitude of X... it just goes on endlessly.

          I do agree with your post overall though.

    • FanaHOVA4 days ago
      • randcraw4 days ago
        A nice overview. In short, I would describe an AI Engineer as someone who integrates existing AI tools and libraries into a reliable system that is fielded in a production environment. They also must know how to tune AI components and assess them for performance, decline/drift, and failure. Most AEs have a MS or less (CS, data science, statistics, etc), since it's not really a research role. Finally, AI Engineers don't invent AI tools or fix what's missing/broken within them. That's the role of an AI scientist.
    • jsight4 days ago
      I think you need to have a pretty good understanding of the underlying workings of at least a couple of modern models. You should also follow along closely enough to see if major changes happen in the way they are built.

      But watching every new paper? Nah, that's mostly only useful if you have a large enough amount of compute to try them out. And most of us don't have that anyway.

    • richardw4 days ago
      The terms are being diluted hard. Everyone from a plumber with a no-code RAG automation tool to a FAANG phd can and have claimed the role “AI engineer”, and it can be true depending on the context.
      • jimbokun4 days ago
        The new “Data Scientist”!
        • richardw4 days ago
          I think AI blurs it more than ever, as it becomes easier to use. A chat makes any rando a superhero in the right context, if others don’t know it.

          Eg a 90 year old in a care home. You learn the ins and outs of any service, you’re the local expert. You don’t even need Excel, just a phone. Very many people who have never heard of deep learning have built chatbots for small retail shops. Drag in a few FAQ docs to the context store and click “go”.

    • feznyng4 days ago
      There’s research papers on system design involving llms that would definitely be useful in practical contexts.
    • paulddraper4 days ago
      An AI Engineer is someone who builds AI products, just as a Software Engineer is someone who builds software products.

      Hope that helps!

    • beanjuiceII4 days ago
      all those PM's and middle layers are going to be an "AI Engineer" soon, so now you know
    • ocular-rockular4 days ago
      You're not necessarily wrong but maybe a bit naive about what cutting edge research entails.
    • belter4 days ago
      At least for 60% of those papers you can't reproduce the results....
    • samstave4 days ago
      [dead]
  • swyx4 days ago
    hi! author here! putting together a list like this is intimidating - for every thing i pick there are a dozen other suitable candidates so please view this as a curriculum with broadly prescriptive weightings, with the understanding that the CURRENTLY_RELEVANT_PAPER is always a moving pointer rather than fixed reference.

    we went thru this specific reading list in our paper club: https://www.youtube.com/watch?v=hnIMY9pLPdg

    if you are interested in a narrative version.

    • Flux1594 days ago
      This is a great list - I was wondering if there was a less research oriented, more experimental or practical reading list that you're planning as well - some things off the top of my head:

      - Actual examples of Fine tuning of LLMs or making merges - usually talked about in r/localLlama for specific use cases like role playing or other scenarios that instruction tuned LLMs are not good at. Jupyter notebook or blog post would be great here.

      - Specifically around Agents & Code generation - Anthropic's post about SWE-bench verified gives a very practical look at writing a coding agent https://www.anthropic.com/research/swe-bench-sonnet with prompts, tool schema and metrics.

      - The wide amount of Loras and fine tunes available on civitai for image models - a guide on making a custom one that you can use in ComfyUI.

      - State of the art in audio models in production - Elevenlabs seems to still be the best for closed platforms, but there are some options for open source voice cloning, TTS, or even text to speech with very small parameter models (kokoro 82M).

      • swyx3 days ago
        welcome to write a guest post around it!
    • adamgordonbell4 days ago
      Great work!

      I am out of my depth when it comes to reading papers, but I second 'The Prompt Report' from your list.

      It gives a great taxonomy that helped me understand the space of prompting techniques better.

    • fancyfredbot4 days ago
      Hey Swyx, this list of papers is really interesting - I like that it's an opinionated list. Why isn't there more about model implementation though? I'd have expected engineers in this field want to know pytorch, cuda, JAX, maybe even XLA?
      • swyx3 days ago
        because of the growth of the field and proliferation of closed model apis, that is out of scope for AI Engineer now - save that for the MLE and RSes
    • andrekorol4 days ago
      I appreciate the effort put into curating and maintaining this list, good job on that!

      I’m curious, is there also some specific existing “AI Researcher Reading List” you would personally recommend? Or do you plan on making and maintaining one?

      • swyx4 days ago
        im no researcher so no, but i would start with the llm courses at stanford, uc berkeley, stanford, and princeton.
  • jamalaramala4 days ago
    From the article:

    > 1. GPT1, GPT2, GPT3, Codex, InstructGPT, GPT4 papers. Self explanatory. (...)

    > 2. Claude 3 and Gemini 1 papers to understand the competition. (...)

    > 3. LLaMA 1, Llama 2, Llama 3 papers to understand the leading open models. (...)

    I agree that you should have read most of these papers at the time, when they were released, but I wonder if it would be that useful to read them now.

    Perhaps it would be better to highlight one or two important papers from this section?

  • bbor4 days ago
    Heh, always fascinating to see how the term “AI” has been swallowed nigh-completely by the recent exciting developments in DL. All those papers and not a single mention of Russell & Norvig, Minsky, Shannon, Lenat, etc.!

    I’m sure it’s a great list for what it is, I just wanted to be pedantic for a bit ;). If you’re interested in an introduction to AI as a broader topic, most graduate courses use the same book (Russel & Norvig) and others may publish their syllabi online.

    • swyx4 days ago
      i’ll go even further, i dont include attention is all you need in my list. i think theres always a necessary intellectual history, but sometimes when you just want to do a specific job, you just want the current thing, and you can backfill the older stuff later (which yes can bite you in the ass if you are ignorant)
  • lxe4 days ago
    I think most of the instruction fine-tuning methods for oss models stem from Alpaca, so it should be included: https://crfm.stanford.edu/2023/03/13/alpaca.html

    And the one referenced in there on synthetic data generation: https://arxiv.org/abs/2212.10560

  • nickpsecurity4 days ago
    This is a great survey. Combine it with the courses below for best results:

    https://www.trybackprop.com/blog/top_ml_learning_resources

  • qrsjutsu4 days ago
    If you are not on that hype-train, yet, then

    don't waste time skimming over, reading and understanding any LLM and AI papers.

    Read about ELIZA. Build your own.

    Get Tensors, Vectors, Fields, Linguistics, Computer Architectures, Networks.

    Focus on the subjects themselves, not them in the context of Neural Networks, "Deep Learning" et al.

    • fancyfredbot4 days ago
      The list seems deliberately focussed on learning practical topics and not the foundations. I think that's what makes it interesting - there are any number of places recommending you start by learning linear algebra, stats, probability...

      Personally I like to learn the foundations but there's genuinely room for useful knowledge of SOTA techniques even without the foundations. To be honest I feel that any amount of learning about computer architecture and vector fields is unhelpful if you are trying to understand good eval benchmarks or prompt engineering techniques.

      • qrsjutsu4 days ago
        > unhelpful if you are trying to understand good eval benchmarks or prompt engineering techniques.

        You are absolutely correct. I jumped to conclusions when I saw the list and read "AI Engineer". The reading list isn't addressing people who want to build AIs, but those who want to maximize and optimize their results with the existing ones.

        My bad.

    • francomoon72 days ago
      what is ELIZA? github link?
  • kevin00914 days ago
    The reading list is old about one year, for instance in 2025, one may use KTO for math, RLOO for CoT, DPO for function calling and optimization.

    In 2025 one should only focus should be distillation & optimization.

    In 2025 CoT is not new, the corrected CoT is the key and all you need.

  • joshdavham4 days ago
    Awesome list!
  • agcobiledyalom4 days ago
    Am interested