Finding out you can’t debug it anymore because you don’t have the mental model of what’s going on is less likely to be shared with the world.
Some people are super comfortable with the prompt-wait-test-reprompt loop - I personally am not. I want to understand every line of code and find it more fatiguing and less rewarding to review pages of LLM code.
The sweet spot for me is using an LLM to write a single function at a time. That’s the unit of work. Constrained, easier to understand and critically, no unexpected changes to other areas of my code.
Here is a real conversation. “I built this app so fast it feels amazing, but then I looked at the code and it had a 6000 line class with one function that was 3000 lines of if statements”
“Oh ya that’s bad. You definitely need to refactor that”
“I thought that, but I wonder if it’s actually better to have a big class in a single file because that’s easier for the AI to understand than if it was in multiple files”
“Umm ok but do you even understand the 3000 line function? Couldn’t that be broken into better code that if/else soup?”
That conversation went on like that for a while.
Meanwhile, I have settled on a process where I built a framework that has good architecture built in and my version of using AI is essentially enforcing compliance with my architecture and coding patterns.
When cursor moved to an Agent view to remove human review I built my own IDE to ensure I never have to adopt stupid coding practices. I used AI to build it and had to constantly stop the AI from doing stupid stuff.
I am happy to share patterns and tools with you because AI can be a massive accelerator and produce good code when managed effectively but it requires a commitment to good code and willingness to ignore where the industry hype is right now.
Sounds like you’ve already made up your mind, though I’ve seen plenty of smart people completely contradict this.
But since I'm so motivated to not believe my job is becoming obsolete, I wanted to present the other side well.
most generative tools are good at tasks and bad at architecture. effective QA of generated code is still at an infant stage no matter what anyone claims, much less automated effective QA; teams that care deeply about getting things right the first time will have a worse time with it on the median than teams that don't.
don't trust the code it generates, but "don't trust" doesn't mean discard it. don't trust the architecture it invents for a problem, it can't reason at that level, it's literally aping the background noise of the entire internet and building things that are dead-center mediocre.
there's not enough specifics in your wall of text to help me point out what's going on in any useful detail, though.
i will say that HN users are more likely to be ecstatic at building an MVP that they never have to support; the scale of a company where you have 6 years experience but are still new on your team is bigger than where most of HN lives and works. the dissonance would be the same if it was 10 years ago, LLMs didn't meaningfully exist, and you'd be on here asking HN why some of these so-called lean teams everyone's posting about all of a sudden seem to be so much more productive than when your boss at BiggerCorp tried to streamline processes and then got yelled at by CS, sales enablement, and marketing.
> there's not enough specifics in your wall of text to help me point out what's going on in any useful detail, though.
Sorry, wasn't sure what was most valuable to people reading this. Some examples:
1. It feels like getting quality code (which our team cares about alot) is hard to get out of LLMs if it's not just piping data around or copying an example I explicitly point to. It doesn't get the scale to build with - often it does weirdly generic code that's not really relevant, or wants to write alot of boilerplate that's not useful (e.g. spamming tests).
2. It hallucinates documentation often enough I've gone back to hunting for source documentation myself (e.g. AWS details).
3. It will flag false positives on junior-engineer "problems" when I request a code review (if there's data being mutated in two different places with different conditions, chances are it'll not understand it the first time).
4. It'll get stuck on nonsense (the "thinking" output makes me cringe) and try to go in random directions if I ask it to debug a problem. I don't think I've had it actually find the actual problem once (but it has found a couple other unrelated problems which is nice).
5. In Plan/Build mode, it'll not follow a plan. It also seems to oddly dodge writing secure code for auth-related stuff (if I hadn't have read through OAuth myself, I wouldn't have caught it).
> tried to streamline processes and then got yelled at by CS, sales enablement, and marketing.
Yes haha! And a few other departments besides.
Anyways thanks for your time :)