This article is perennially posted here and is probably the best breakdown of this quote.
It's one thing to point out small trivialities like initialization and life time issues in a small piece of code. But it's quite another to prove they don't exist in a large code base.
Kernigan is a good source of quotes and thinking on programming.
Now, I have had to do some salesforce apex coding and the framework requires tests. So I write up some dummy data of a user and a lead and pass it through the code, but it feels of limited value, almost like just additional ceremony. Most of the bugs I see are from a misconception of different users about what a flag means. I can not think of a time a test caught something.
The organization is huge and people do not go and run all the code every time some other area of the system is changed. Maybe they should? But I doubt that would ever happen given the politics of the organization.
So I am curious, what are the kinds of tests do people write in other areas of the industry?
Aerospace here. Roughly this would be typical:
- comprehensive requirements on the software behavior, with tests to verify those requirements. Tests are automated as much as possible (e.g., scripts rather than manual testing)
- tests are generally run first in a test suite in a completely virtual software environment
- structural coverage analysis (depending on level of criticality) to show that all code in the subsystem was executed by the testing (or adequately explain why the testing can't hit that code)
- then once that passes, run the same tests in a hardware lab environment, testing the software as it runs on the the actual physical component that will be installed on the plane
- then test that actually on a plane, through a series of flight tests. (The flight testing would likely not be as entirely comprehensive as the previous steps)
A full round of testing is very time-consuming and expensive, and as much as possible should be caught and fixed in the virtual software tests before it even gets to the hardware lab, much less to the plane.
Per corporate policy -- in the name of safety and legal reasons -- the most extensive public uses of AI, like vibe coding and agentic systems, aren't really options. The most common usage I have seen is more like consulting AI as a fancy StackOverflow.
Will this change? I personally don't expect to ever see pure vibe coding, with code unseen and unreviewed, but I imagine AI coding uses will expand.
This is where testing gets interesting: I took some old code I wrote 30 years ago or so and decided to put it literally to the test. A couple of hundred lines from a library that has been in production without every showing a single bug over all that time. And yet: I'm almost ashamed at how many subtle little bugs I found. Things you'd most likely never see in practice, but still, they were there. And then I put a couple of those bugs together and suddenly realized that that particular chain must have happened in practice in some program built on top of this. And sure enough: fixing the bugs made the application built on top of this more robust.
After a couple of weeks of this I became convinced: testing is not optional, even for stuff that works. Ever since I've done my best to stop assuming that what I'm writing actually does what I want it to. It usually does, for the happy path. But there are so many other paths that with code of any complexity, even if you religiously avoid side effects you can still end up with issues that you overlook.
The system I'm working on has been in production for 12 years - we have added a lot of new features over those years. Many of those needed us to hook into existing code, tests help us know that we didn't break something that used to work.
Maybe that helps answer the question of why they are important to me. They might not be to your problems.
I think if: - the code base implements many code paths depending on options and user inputs and options such that a fix for code path A may break code path B - it takes a great deal of time to run in production such that issues may only be caught weeks or months down the line when it becomes difficult to pinpoint their cause (not all software is real-time or web) - any given developer does not have it all in their head such that they can anticipate issues codebase wide
then it becomes useful to have (automated) testing that checks a change in function A didn't break functionality in function B that relies on A in some way(s), that are just thorough enough that they catch edge cases, but don't take prod levels of resources to run.
Now I agree some things might not need testing beyond implementation. Things that don't depend on other program behavior, or that check their inputs thoroughly, and are never touched again once merged, don't really justify keeping unit tests around. But I'm not sure these are ever guarantees (especially the never touched again).
For my projects, I mainly want to Get Shit Done. So I write tests for the major functional areas of the business logic, mainly because I want to know ASAP when I accidentally break something important. When a bug is found that a test didn't catch, that's usually an indicator that I forgot a test, or need to beef up that area of functional testing.
I do not bother with TDD, or tests that would only catch cosmetic issues, and I avoid writing tests that only actually test some major dependency (like an ORM).
If the organization you are in does not value testing, you are probably not going to change their mind. But if you have the freedom to write worthwhile tests for your contributions to the code, doing so will probably make you a better developer.
I worked for a company that had no tests.
I worked on the core software, new employee, the programmer who wrote the software gone...
Regularly released new features and found out, some days later, that I'd broken some peripheral, but important, business logic.
Drove me mad! I was not allowed to write tests, it was "unproductive"
Most software has "bugs" simply because people couldn't communicate how it should work.
I think most programmers are on top of actual runtime errors or compilation errors. It's straightforward on how to fix those. They are not on top of logic issues or unintended behavior because they aren't the product designer.
Programmers just cook the burger. If you order it well done, don't complain when it doesn't come out medium rare.
You wanted examples: https://github.com/openjdk/jdk/tree/master/test/jdk/java/uti...
In old days, for the kinds of things I had to work on, I would test manually. Usually it is a piece of code that acts as glue to transform multiple data sources in different formats into a database to be used by another piece of code.
Or a aws lambda that had to ingest a json and make a determination about what to do, send an email, change a flag, that sort of thing.
Not saying mock testing is bad. Just seems like overkill for the kinds of things I worked on.
My guess is that some languages - like Go - have a more robust testing culture than other languages like PHP.
There is genuinely a reasonable and rational argument to “testing requires more effort than fixing the issues as users find them” if the consequences are low. See video games being notorious for this.
So, industry is more important than language I’d say.
If my project has tests I can work so much faster on it, because I can confidently add tests and refactor and know that I didn't break existing functionality.
You gotta pay that initial cost to to get the framework in place though. That takes early discipline.
That’s absolutely a quality thing. I can assure you that you could move a lot faster if you didn’t try and meet such standards, not that it’d be a good idea necessarily, but in isolation it proves the point.
One thing I found is that if testing is easy, your code structure does change a bit to aid with a “test first” approach and I don’t hate it. I thought it made me slower but it doesn’t, it ensures that when all the ground work is finished, the gnarly part of wiring everything up goes much faster.
• Types enforce safety
• Very tight dependencies
• A tight design (value, store, control flows) where most bugs are likely to be catastrophic, or at least highly visible, as apposed to silent. It works, or it doesn't.
• Use naming and code organization first, concise comments second, and a page or two of doc if all else fails, to make any non-intuitive optimization or operation intuitive again. (Clarify as needed: what it does, how it does it, and why that specific method was chosen.)
For me, a bug repellent design is more important than tests, during development. Except where there is truly messy functionality that no implementation can un-mess. Or results are not used in any way that creates implicit tests them. I.e. they are an intermediate solution to a downstream problem that is not part of the development context.
That seems so basic to writing the "most cleverest" code, that the quote makes little sense to me.
I spend a very small percentage of my time debugging, outside of the real-time development-time write, run, correct loop. If debugging ever took a significant fraction of time, it would give me some serious anxiety.
But as the article states, that may be the result of learning from my less manageable cleverness on ambitious personal projects, over and over, early on. Endless personal projects, in which any success immediately brings into view "the next exciting level" they could become, often requiring a redesign, are great for forcing growth.
Claude Code can do this in the background tirelessly while I can personally focus more on tasks that aren't so "grindy".
That seems a premature conclusion. LLMs excel at meeting the requirements of users having little if any interest in debugging. Users who have a low tolerance for bugs likewise have a low tolerance for coding LLMs.
In other words, the "cleverness" of AI will eventually be pinned. Therefore only a certain skill level will be required to debug the code. Debug and review. Which means innovation in the industry will slow to a crawl.
AI will never be able to get better either (once it plateaus) because nothing more clever will exist to train from.
Though it's a bit worse than that. AI is trained from lots of information and that means averages/medians. It can't discern good from bad. It doesn't understand what clever is. So it not only will plateau, but it will ultimately rest at a level that is below the best. It will be average and average right now is pretty bad.
That seems a premature conclusion. LLMs are quite good as debugging and much faster than people.
Here is when he was young:
https://www.youtube.com/watch?v=tc4ROCJYbm0
Kind of explains in an epic manner why pipes were useful back then.
They are still useful, but computers are so much more powerful that many techniques back then came primarily because their computers were not as powerful. I still, oddly enough, prefer the hardware of the 1970s, 1980s, almost early 1990s too. Today's hardware is much better, but I am nowhere as interested in it; it is now just like a common tool.
foo | bar | baz
is useful whether you have a 25Mhz uniprocessor or 1024 cores running at 8GHz.It is true that the unix pipe model is explicitly designed around text as the data format, and it is true that there's lot of data on computers in 2026 that is not sensibly represented as text. But there's also lots that is, and so the pipe model continues to be valuable and powerful.
They're energetic "interns" that can churn out a lot of stuff fast but seem to struggle a lot with critical thinking.
I don’t particularly like them or dislike them, they’re just tools. But saying they never work for bug fixing is just ridiculous. Feels more like you just wanted an excuse to get on your soapbox.
It is an illusion arising from anthropomorphisation. They aren't thinking at all. They are just parotting the output of thinking that has long gone.
Just focusing on the outputs we can observe, LLMs clearly seem to be able to "think" correctly on some small problems that feel generalized from examples its been trained on (as opposed to pure regurgitation).
Objecting to this on some kind of philosophical grounds of "being able to generalize from existing patterns isn't the same as thinking" feels like a distinction without a difference. If LLMs were better at solving complex problems I would absolutely describe what they're doing as "thinking". They just aren't, in practice.
"Seem". "Feel". That's the anthropomorphisation at work again.
These chatbots are called Large Language Models for a reason. Language is mere text, not thought.
If their sellers could get away with calling them Large Thought Models, they would. They can't, because these chatbots do not think.
Those are descriptions of my thoughts. So no, not anthropomorphisation, unless you think I'm a bot.
> These chatbots are called Large Language Models for a reason. Language is mere text, not thought. If their sellers could get away with calling them Large Thought Models, they would. They can't, because these chatbots do not think.
They use the term "thinking" all the time.
----
I'm more than willing to listen to an argument that what LLMs are doing should not be considered thought, but "it doesn't have 'thought' in the name" ain't it.
The result of anthromorphisation. When we treat a machine as a machine, we less need to understand it by seems and feel.
> They use the term "thinking" all the time.
I find not. E.g. ChatGPT:
Short answer? Not like you do.
Longer, honest version: I don’t think in the human sense—no consciousness, no inner voice, no feelings, no awareness. I don’t wake up with ideas or sit there wondering about stuff. What I do have is the ability to recognize patterns in language and use them to generate responses that look like thinking.
Wouldn't it make intuitive sense for "writing new code to do a task" and "tracking down a problem debugging code" to be multiple different skills and not the one same skill? Wouldn't it make sense for the one you do more of to be the one you are better at, and not directly 'smart' related? Wouldn't it make sense if tooling could affect the ease of debugging?
- the author wrote it (including 'debugging') until it worked properly. Therefore they were clever enough to write it that way.
- the author can't make it work (including 'debugging') and therefore they aren't clever enough to write it that way.
And there cannot be a state where they (are clever enough to write it but it doesn't work properly) and they (are not clever enough to debug it), because the fact that it doesn't work properly and they can't make it work properly refutes the claim that they were clever enough to write it that way, and it becomes the second state above. Which puts you on the side of what I'm saying?
I'd read the quote as saying "if you write a compression algorithm with the most obfuscated 'clever' macro-using compiler-quirk exploiting C code you can manage and it doesn't work properly, you won't have enough brain power left over to debug such code and make it work. Instead you should have written it with idiomatic C and boring for(){} loops and then you would have a better chance of debugging it until it works". I was questioning the quote and suggesting that if debugging is a separate skill, it must be one that can be developed and improved by practise and tooling.
The person above me suggested that coding and debugging are not separate skills, which rather throws the quote into confusion - if you can "write" it but it doesn't work, and you can't debug it, and "debugging" is the same skill, what are we talking about at all?
And even when it is, sometimes the "not working properly, must debug" point occurs later in time (sometimes much later) from the "it appears to be working" point.
As I take the quote, it would have been about 1970's, 1980's C and would not have had the benefit of an IDE with "edit and continue" or a LISP or Prolog or Smalltalk interactive REPL with live edit and retry, or ELM's "time travelling debugger" or Git and all related tooling for tracking down changes and who made them, or more modern fuzzers and Valgrinds and static analyzers.
Making a case for writing non-surprising idiomatic code is one thing, but HN parroting the "debugging is twice as hard as coding" and downvoting someone who asks for evidence for this claim is cargo-culting. Why would it be twice as hard and not 1.2x as hard or the same hardness or 10x or 100x as hard? And why would the relationship be fixed, even if tooling and languages and the industry change? And what does it mean to say you can write a, say, compression algorithm "as cleverly as you can" but it's twice as hard as that to spot that you typoed a variable name or something?
IME it pays dividends but it can be really painful. I’ve run into a situation multiple times where I’m using Claude Code to write something, then a week later while working it’ll come up with something like “Oh wait! Half the binaries are in .Net and not Delphi, I can just decompile them with ilspy”, effectively showing the way to a better rewrite that works better with fewer bugs that gets done in a few hours because I’ve got more experience from the v1. Either way it’s tens of thousands of lines of code that I could never have completed myself in that amount of time (which, given problems of motivation, means “at all”).
You want them writing tests especially in critical sections, I'll push to 100% coverage. (Not all code goes there, but thing that MUST work or everything crumbles. Yeah I do it.)
There was one time I was doing the classic: Pull a bug find 2 more thing. And I just told the LLM. "100% test coverage on the thing giving me problems." it found 4 bugs, fixed them, and that functionality has been rock solid since.
100% coverage is not a normal tool. But when you need it. Man does it help.
But how do you know if you got it?
I've seen no LLM that can even verify execution pathway coverage.