Writing High Quality Production Code with LLMs Is a Solved Problem(escobyte.substack.com)

8 pointsby menzoic6 hours ago5 comments

menzoic6 hours ago
I work at Airbnb where I write 99% of my production code using LLMs. Spotify's CEO recently announced something similar, but I mention my employer not because my workflow is sponsored by them (many early adopters learned similar techniques), but to establish a baseline for the massive scale, reliability constraints, and code quality standards this approach has to survive.
Many engineers abandon LLMs because they run into problems almost instantly, but these problems have solutions. If you're a skeptic, please read and let me know what you think.
The top problems are:
* Constant refactors (generated code is really bad or broken)
* Lack of context (the model doesn’t know your codebase, libraries, APIs, etc.)
* Poor instruction following (the model doesn’t implement what you asked for)
* Doom loops (the model can’t fix a bug and tries random things over and over again)
* Complexity limits (inability to modify large codebases or create complex logic)
In this article, I show how to solve each of these problems by using the LLM as a force multiplier for your own engineering decisions, rather than a random number generator for syntax.
A core part of my approach is Spec-Driven Development. I outline methods for treating the LLM like a co-worker having technical discussions about architecture and logic, and then having the model convert those decisions into a spec and working code.
carrot5Top5 hours ago
For sure, with the latest models, treating the model like a respected professional that needs context and input is essential. usually I get the best results when the context window is right around 70% full
- menzoic5 hours ago
  > get the best results when the context window is right around 70%
  I used to be trigger happy with /compact or using the hand off technique to transfer knowledge between sessions with a doc. But lately the newer generation of models seem to be handling long context pretty well up to around 20% remaining context.
  But this is when I'm working on the same focused task. I would instantly reset it if I started implementing an unrelated task. Even if there was 90% left, since theres just no benefit to keeping the old context
chalmers5 hours ago
Yep! That’s almost exactly the workflow I’ve landed on too. I could not agree more
- menzoic5 hours ago
  It's basically the typical SDLC boosted with LLMs. Especially the part where you can explore tradeoffs and alternative approaches rapidly.
5 hours ago
undefined
Soupzzz5 hours ago
I read you are using Codex and lost interest in the rest of the post
- menzoic5 hours ago
  LOL, honestly I hated Codex when it first came out. It was backed by o3 at the time.
  But literally as soon as GPT-5 came out in Codex and with the "high" option, I completely switched from Claude Codex to Codex. Never imagined that would happen so fast.