This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.
A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.
It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.
I'm starting to think of projects now as a tree structure where the overall architecture of the system is the main trunk and from there you have the sub-modules, and eventually you get to implementations of functions and classes. The goal of the human in working with the coding agent is to have full editorial control of the main trunk and main sub-modules and delegate as much of the smaller branches as possible.
Sometimes you're still working out the higher-level architecture, too, and you can use the agent to prototype the smaller bits and pieces which will inform the decisions you make about how the higher-level stuff should operate.
Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.
Taking it piece by piece in Claude Code has been significantly more successful.
But not so good at making (robust) new features out of the blue
The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.
Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes
The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).
What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.
Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.
With where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code.
That's one very short step removed from Simon Willison's lethal trifecta.
I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].
And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.
[0]: https://ricardoanderegg.com/posts/getting-better-coding-llms...
Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".
You mentioned "harness engineering". How do you approach building "actual programmed tools" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?
LOL, been there, done that. It is much less frustrating and demoralizing than babysitting your kind of stupid colleague though. (Thankfully, I don't have any of those anymore. But at previous big companies? Oh man, if only their commits were ONLY as bad as a bad AI commit.)
Tons of respect for Mitchell. I think you are doing him a disservice with these kinds of comments.
Some of us enjoy learning how systems work, and derive satisfaction from the feeling of doing something hard, and feel that AI removes that satisfaction. If I wanted to have something else write the code, I would focus on becoming a product manager, or a technical lead. But as is, this is a craft, and I very much enjoy the autonomy that comes with being able to use this skill and grow it.
I consider myself a craftsman as well. AI gives me the ability to focus on the parts I both enjoy working on and that demand the most craftsmanship. A lot of what I use AI for and show in the blog isn’t coding at all, but a way to allow me to spend more time coding.
This reads like you maybe didn’t read the blog post, so I’ll mention there many examples there.
Well, yes, they are, some folks don't think "here's how I use AI" and "I'm a craftsman!" are consistent. Seems like maybe OP should consider whether "AI is a tool, why can't you use it right" isn't begging the question.
Is this going to be the new rhetorical trick, to say "oh hey surely we can all agree I have reasonable goals! And to the extent they're reasonable you are unreasonable for not adopting them"?
I think one of the more frustrating aspects of this whole debate is this idea that software development pre-AI was too "slow", despite the fact that no other kind of engineering has nearly the same turn around time as software engineering does (nor does they have the same return on investment!).
I just end up rolling my eyes when people use this argument. To me it feels like favoring productivity over everything else.
Which is why I like this article. It's realistic in terms of describing the value-propositio of LLM-based coding assist tools (aka, AI agents).
The fact that it's underwhelming compared to the hype we see every day is a very, very good sign that it's practical.