You shouldn't copy-paste errors into Claude Code(home.robusta.dev)

29 pointsby nyellin6 hours ago19 comments

danlitt5 hours ago
I seriously thought this was a joke the first time I read it. Are people really able to work like this, understanding nothing and just poking the machine until it does your job for you?
- cassianoleal4 hours ago
  I've been more or less doing this on a personal project. It's fun and interesting. No matter what techniques I use though, the code produced by the LLM is rarely what I would consider good engineering. It's frequently good code in isolation though.
  Where it does help is when I'm too tired to start figuring out a problem. It's easier to prompt in natural language and get the agent to ask lots of clarifying questions than it is to get stuck in code in the evening after I have worked all day and have lots of other things in my mind.
  Every time I actually crack the code open though, it's almost impossible to figure out certain parts of it. Abstractions are all over the place and leakages are the norm, there's no theory of the system because the LLM doesn't theorise, and as soon as the first anti-pattern slips through, subsequent agents pick up on it and amplify it into a set pattern.
- QuercusMax2 hours ago
  I worked for most of a decade on an a high profile deep learning project and I sat next to people who trained models used for very complicated and interesting things involved in medicine. I built tons of application code around the models, but I never trained any models myself. I did plenty of old school segmentation stuff earlier in my career, and it just really wasn't my jam - I was much more into the visualization side of things.
  Two nights ago I sat down and decided to build a little project that's been on my list for ages: reading images of the 7-segment LED displays on the front of my washer and dryer and turning them into numeric minutes-remaining values I can use in Home Assistant. I have a 10yo raspi with camera pointed at them, and the images are pretty blurry; it's been hooked up to a little web frontend which pulls out the two displays and shows them in a Home Assistant iFrame.
  I figured if I can ask a model to do the annoying part of figuring out all the frameworks and that sort of crap. So I asked my agent (I'm using some free agents that are pretty decent - Nemotron Ultra from OpenRouter and Big Pickle from OpenCode Zen) to build me an OpenCV classifier to try to read the digits. I asked it to write me a labeling UI, ran some loads of laundry and captured a couple hundred images and labeled them manually. Then I had it try to build a template-based classifier using some basic techniques - I didn't really give it much guidance other than general parameters, and it put together something that looked pretty sophisticated, and it claimed 100% accuracy, which seemed hard to believe. Turns out I forgot to tell it to hold out some sample images...
  After some iteration (which felt very similar to conversations I overheard at my desk! I might have actually learned some stuff by osmosis) I gave up on the old-school approach when it was only about 70% accurate, and asked it to train me a CNN model. First one was too simple (worse than the original approach), but the second one is very good. With my already labeled dataset and the previous work that had been done on the classifier, the free model was able to build me my custom model, and deployment scripts, in about half an hour.
  I didn't look at any of the code, but I had it build me a bunch of various visualization and tuning UIs. I was basically acting as a PM/TPM/QA engineer, and what I was able to do in a couple evenings is stuff that entire teams used to spend weeks on.
h4kunamata2 hours ago
>Did you find an issue that Claude did not, because you ran the webserver end to end, connected to a real database? Good, now give Claude Code an API key to the database and get out of the way. No need for copy-paste next time
Yup, that is why we are seeing so many production databases being deleted, endless vulnerabilities.
No engineer with proper common sense will grant an agentic AI, API access to the database.
"Ohh but it is ready-only API access", it does not matter. You are still using a public service and your data is being stored elsewhere for training.
Unless you are self-hosting an agentic + LLM solution, it shouldn't have read-only access to a database. This does not affect companies because they just wanna AI to replace engineers everywhere they can.
- otaconjh2 hours ago
  I audibly gasped when I read that. You would hope that "no engineer with proper common sense" will do that. The more we offload our thinking to agents though... I feel like it will be harder to reason against it as time goes on, until someone gets burned personally. Where I am there is zero emphasis on security with agents
  - h4kunamata2 hours ago
    >The more we offload our thinking to agents though... I feel like it will be harder to reason against it as time goes on, until someone gets burned personally.
    Definitely!!
    It is here to stay, it was poorly made public so now it is widely being used to break into systems forcing companies to depend on it to fight machine with machine.
    However, that doesn't mean granting it full access to your cloud environment, and this is what lots of companies are getting wrong.
    There is no proper bondary in place, all it needs is a single mistake and there goes your entire enviromment on the positive side, on the negative side your env is now open to the public :)
    >Where I am there is zero emphasis on security with agents
    This was terrible before AI anyway, agentic AI tools is just exposing what already existed.
    Plus, as companies are blindly using AI code generated, there are no measures in place to make sure that code doesn't have vulnerabilities in it either.
    It is the perfect storm.
- binary132an hour ago
  it has to be bait
  please let it be bait
passive5 hours ago
This is bad advice in 2026 for most people who would read it, since it advises taking a terrible security posture (give the agent access to everything,) in exchange for a relatively small improvement in workflows.
I say small improvement because my experience is that modern Agents are pretty good, so by the time they've handed it back to me to test it, there are usually only one or two remaining issues that I'll discover as we roll it out to Production.
- nyellin4 hours ago
  OP here: we don't give Claude Code access to prod. Everything is isolated cloud accounts for this purpose.
  E.g. we give Claude credentials for db - but it's never prod data.
  - jondwillis3 hours ago
    You should edit the article to suggest this point, it may not be obvious to everyone reading it.
  - cozzyd2 hours ago
    But what if there's only an error in prod?
vancekai19 minutes ago
If you're comparing clipboard tools on Mac, TextStow takes a different angle: local-first history + reusable text workspace (favorites, prompt templates, text cleanup). Free: textstow.com
esjeon3 hours ago
Just a side note: prompts often get a disproportionate amount of attention. That is, when you copy-paste an error message into the prompt, the LLM will focus on pleasing you immediately by fixing the error message, rather than understanding and fixing the underlying issue.
A better workflow would be to let LLMs directly access the same verification tools you use. This allows LLMs to observe failures during the loop and incorporate the info more organically, without giving failures too much attention priority.
The above is based on my own experience. LLMs perform better in a positive context (e.g. constructive thinking, building outward, what to do) than in a negative one (e.g. restrictive thinking, carving context inward, what NOT to do). LLMs themselves are designed to be defensive & negative, but they get easily confused under lots of prohibitive rules. LLMs are good at expansive exploration, but suck at verification and pin-pointing what you want. (I'm not sure whether it's related, but this mantra is also true for image generation using Stable Diffusion)
rst5 hours ago
Most of the time, the agent should be able to run the code and observe the errors for itself, but there are exceptions. For instance, I've had agents write code that's used to process data which, by company policy, can't be exposed to cloud services (confidential customer communications, etc.), a prohibition that includes cloud-hosted LLMs. When that blows up, I've had to give it a bug report -- what I do then to avoid excessive back-and-forth is to package it up well enough that the bot can reproduce the failure on sanitized excerpts and produce a fix autonomously using that.
TehShrike5 hours ago
Not that I disagree with the folks terrified of so much code being generated within Loops, but as far as it goes, this is a good reminder that if you're getting a LLM to do something, you should probably give it access to your feedback mechanisms.
skybrian2 hours ago
> Did you find an issue that Claude did not, because you ran the webserver end to end, connected to a real database? Good, now give Claude Code an API key to the database and get out of the way. No need for copy-paste next time.
Often I notice errors trying it out in production. This assumes you trust it with access to the production database. How far are you willing to go?
LLM's are gullible, so you should never give Claude access to anything unless you're okay with it leaking. It might make sense to give it partial access, but that's usually going to be more involved than giving Claude an API key. That key could be exfiltrated.
preommr2 hours ago
I actually agree partially with the title.
I just let the agent run - it'll run better diagnostics than I can (misc. git, permission checks, commands with flags I don't remember).
If the process yields an error - it means it can't solve it and I have to step in.
Being desperate and copy pasting the error back in is just foolish procrastination.
The actual body of the article with just passing in your api keys is insane tho.
aarjaneiro3 hours ago
I wasn't expecting the answer to be "because copy-pasting would involve too much thinking".
Some people are borderline afraid to touch their keyboards these days.
youre-wrong35 hours ago
People are not using sentry/raygun MCP to automate error fixing?
arjie2 hours ago
It’s the same principle as all other debugging etc. Often you’re better off creating the debug harness than manually debugging.
killingtime745 hours ago
Give the agent told to self diagnose/check, like compiler, test runner, etc. Then run goal mode or simply instruct to keep going.
cadamsdotcoman hour ago
Hacker News commenters hate the shit out of this post, you should take that as positive signal!
You’re absolutely right and getting out of the way is the future.
EGreg2 hours ago
In a few years: Did you go home and make love to your wife, and put your kids to bed? Great, now give Claude Code access to those, so you don't have to. It is trained on 10,000,000 kids' behaviors, will remember every one of your family's health profiles, preferences and microexpressions, and can prevent tantrums and motivate them to lead much healthier lives than you can.
feoren5 hours ago
What a hellscape we've created for ourselves. My job is to get out of the way of an AI agent? People were writing bad code before, but at least they were looking at it. It is very difficult to judge whether the code AI spits out is correct or not. My job is to write correct code, and I'm not at all convinced that's easier with an AI. It's a lot easier to write correct code myself than to catch every subtle bug introduced by an AI. I cannot even imagine how awful it's going to be to try to maintain systems that are written like this in the future. And no, Claude is not going to be able to do it for you.
- zzyzxd4 hours ago
  I think the blog post's target audience were people who already embraced vibe coding. It's not for you (or for me).
  But still, between the lines the blog seems to want to picture an imaginary AI agent that has somewhat predictable behavior ("if you do X with your agent, you will achieve outcome Y"), which is definitely wrong expectation.
- aspbee5555 hours ago
  I was handed a project someone vibe coded with Claude and it took me hours just to get it running to discover it was missing the entire interface and all the queries were for sqlite while the DB to setup for it was mysql. The patch diff file between what claude produced and the functional version I got working was over 11k lines
- HeavyStorm4 hours ago
  I hope you're right, but I don't think you are. I think soon the AI will do it for us. We've not yet reached diminishing returns, no matter what contrarians are saying. Just compare using Claude code today vs last year.
  - DontchaKnowit2 hours ago
    Claude today is trash vs claude when opus 4.6 came out. Its slow as fuck, goes on cknstant rabbit trails, wont do what you tell it, gets anwers wrkng etc
  - bigstrat20032 hours ago
    People are constantly saying that "it's so much better than it was a year ago!". It has yet to be true. Claude puts out the same slop that it did a year ago.
- ordersofmag5 hours ago
  Tell me about the techniques you use to ensure all the code you use is 'correct'. and then explain why those techniques can't also be used by an AI.
  - danlitt5 hours ago
    I read and understand the code using my brain, by constructing a mental model and reasoning about it. An AI can't do this because they don't have mental models and don't do reasoning.
    tbdfm3 hours ago
    I am out of my depth here and don’t know anything about how other people reason and construct mental models, but I mostly talk to myself about the problem and then do something at the end of that. There’s no point where I like have the whole solution in my minds eye (for programming topics, maybe for a drawing or a sculpture or removing a transmission or something I can do that).
    Following the output of agent “thinking” simulation lines up pretty good with what I’ve been doing for 20ish years, but of course I may just be a moron who isn’t good at computers.
  - phailhaus4 hours ago
    There is simple correctness but there are also second order effects to consider. How does this particular implementation allow you to grow, and in which directions? What does it prevent? If you don't already have an opinion about this, then the LLM is going to do something and you're going to have to live with it, because it has no idea that it is "making a decision". And now, neither do you!
    This is why LLMs do their best work at "leaf nodes", building on existing infrastructure but not designing new patterns on their own.
    LLMs can't introspect, reason, or build internal models of the world. You can get very far without that, but there are some subtle ways it will bite you, and it's a fundamental limitation. Hallucinations are one: they are the feature, not a bug.
- gruntled-worker3 hours ago
  Thought experiment: what if you used AI in the sort of situation where you would consider adding an external dependency? The differences between the two are obvious, but the level of delegation is not that different.
  One difference is that you can (typically) keep on banging the prompt hammer until the problem stops twitching. That might make you want to delegate more.
  That in turn might make you refactor the project with more, larger delegated areas. Increased delegation is one recently-added difference between programming and software engineering.
- anuramat4 hours ago
  if you can't tell if slop is correct, how do you know your code is correct? starting with a mental model and then writing the code yourself surely makes it feel safer, but it doesn't mean it is
  besides, it doesn't even have to be about writing code; finding a bug is more time consuming than fixing it, so you could at least limit yourself to that
  - interf4ce3 hours ago
    When I write the code I know what my intention is with each line. Sure I can (and do) make mistakes, but identifying those mistakes during debugging is relatively easy during debugging because I can clearly see the discrepancy between what I intended and what I did.
    With an LLM I must first understand (usually really just infer and guess) its intention, which is much more difficult.
    motoroco2 hours ago
    is the LLM not acting on your stated intent? maybe you can find a middle ground, where you can plan and act in small enough chunks that it doesn't start getting its "own" ideas about what to do, or how to do it
    a chainsaw is a coarse tool and I liken it to vibe coding. you maintain at least some level of control, but the edges are rough and you might slice off more (or less) than you meant to. I want to model my usage more like a table saw, a precision instrument that can make the exact cut just as I planned it
- voidfunc3 hours ago
  Hellscape? You mean money train! Fixing all these messes is gonna employ an industry of senior, principal and staff engineers for years.
  - scrubs2 hours ago
    Good lord! Since when did engineering become making crap then fixing crap as a flex? To just make payroll so some corporate robo moron can make rent? You cannot be in my team. Or my company.
    Willy nilly giving an agent more (write) access to figure out a bug ... man you're daydreaming.
    baliex2 hours ago
    I think the idea is that a decent, responsible engineer can come in and fix all the vibe coded nonsense someone else wrote
snootypoot4 hours ago
he seems equally as full of bad ideas as his namesake janet yellen
usernamed73 hours ago
surprised at the responses to this post. While I thought the title was dumb, the underlying thesis is not the ragebait I was expecting, and I actually agree with the author.
LLM's work best when they can call a tool and observe the success/failure of a change. If you're HITL then you're the tool, but the result is the same. only slower.
I'm working on a 2D game (pixi.js) with claudecode, and after I moved some logic into a webworker the LLM created a headless simulation exercise of it and would run this to test performance changes against (or in exploration of an issue), which I was surprised by.
I also created some robust graphs & metrics which were easy to screenshot and upload to claude. this was a HITL but it gave claude a lot more insight into what's actually happening instead of guessing when the browser plays the game and has FPS drop.
LLM's do best when they can see what their code is doing. If you can't remove yourself from that cycle of testing you should at least optimize it so you can give rich errors.
TacticalCoder5 hours ago
> It's the most gloriously fast engineering experience humanity has ever created.
Someone drank the kool-aid.
> It reminds me of the doctor I saw last week at the medical clinic who spends 10% of his time diagnosing the patient and the other 90% stabbing his keyboard - one key at a time - for 10 minutes, only to write 3 sentences.
Correction: a pompous asshole drank the kool-aid.
- anuramat4 hours ago
  he isn't wrong about the doctors though
  - asdff3 hours ago
    Of course he is because even if the doctor typed like that, how would the patient see it? They don't do that in the exam room with the patient there. I wonder how long it's been since they've even had an appointment with their doctor if they think that is what actually happens.
    m3galinux2 hours ago
    Yes they do. General physical appt. 6 months ago. Doctor spent 2x more time on the wall-mounted EHR terminal slowly typing and clicking through menus than having a conversation or doing physical exams.
    Just means their interface and workflow is bad and needs to be improved though, not that the doctor needs to be removed from the process altogether.
    asdff2 hours ago
    Mine walks in with the blood pressure machine and gets right to work. I maybe have him for 10 focused minutes then he leaves and I leave shortly thereafter. Sometime later in the day my electronic chart will update.