202 pointsby vinhnx3 hours ago36 comments
  • Frannky2 minutes ago
    I tried Opus 4.6 recently and it’s really good. I had ditched Claude a long time ago for Grok + Gemini + OpenCode with Chinese models. I used Grok/Gemini for planning and core files, and OpenCode for setup, running, deploying, and editing.

    However, Opus made me rethink my entire workflow. Now, I do it like this:

    * PRD (Product Requirements Document)

    * main.py + requirements.txt + readme.md (I ask for minimal, functional, modular code that fits the main.py)

    * Ask for a step-by-step ordered plan

    * Ask to focus on one step at a time

    The super powerful thing is that I don’t get stuck on missing accounts, keys, etc. Everything is ordered and runs smoothly. I go rapidly from idea to working product, and it’s incredibly easy to iterate if I figure out new features are required while testing. I also have GLM via OpenCode, but I mainly use it for "dumb" tasks.

    Interestingly, for reasoning capabilities regarding standard logic inside the code, I found Gemini 3 Flash to be very good and relatively cheap. I don't use Claude Code for the actual coding because forcing everything via chat into a main.py encourages minimal code that's easy to skim—it gives me a clearer representation of the feature space

  • achenatxa minute ago
    I use amazon kiro.

    The AI first works with you to write requirements, then it produces a design, then a task list.

    The helps the AI to make smaller chunks to work on, it will work on one task at a time.

    I can let it run for an hour or more in this mode. Then there is lots of stuff to fix, but it is mostly correct.

    Kiro also supports steering files, they are files that try to lock the AI in for common design decisions.

    the price is that a lot of the context is used up with these files and kiro constantly pauses to reset the context.

  • cadamsdotcom4 minutes ago
    The author is quite far on their journey but would benefit from writing simple scripts to enforce invariants in their codebase. Invariant broken? Script exits with a non-zero exit code and some output that tells the agent how to address the problem. Scripts are deterministic, run in milliseconds, and use zero tokens. Put them in husky or pre-commit, install the git hooks, and your agent won’t be able to commit without all your scripts succeeding.

    And “Don’t change this function signature” should be enforced not by anticipating that your coding agent “might change this function signature so we better warn it not to” but rather via an end to end test that fails if the function signature is changed (because the other code that needs it not to change now has an error). That takes the author out of the loop and they can not watch for the change in order to issue said correction, and instead sip coffee while the agent observes that I caused a test failure and corrects it without intervention.

  • haolez2 hours ago
    > Notice the language: “deeply”, “in great details”, “intricacies”, “go through everything”. This isn’t fluff. Without these words, Claude will skim. It’ll read a file, see what a function does at the signature level, and move on. You need to signal that surface-level reading is not acceptable.

    This makes no sense to my intuition of how an LLM works. It's not that I don't believe this works, but my mental model doesn't capture why asking the model to read the content "more deeply" will have any impact on whatever output the LLM generates.

    • nostrademonsan hour ago
      It's the attention mechanism at work, along with a fair bit of Internet one-up-manship. The LLM has ingested all of the text on the Internet, as well as Github code repositories, pull requests, StackOverflow posts, code reviews, mailing lists, etc. In a number of those content sources, there will be people saying "Actually, if you go into the details of..." or "If you look at the intricacies of the problem" or "If you understood the problem deeply" followed by a very deep, expert-level explication of exactly what you should've done differently. You want the model to use the code in the correction, not the one in the original StackOverflow question.

      Same reason that "Pretend you are an MIT professor" or "You are a leading Python expert" or similar works in prompts. It tells the model to pay attention to the part of the corpus that has those terms, weighting them more highly than all the other programming samples that it's run across.

      • an hour ago
        undefined
    • jcdavis2 hours ago
      Its a wild time to be in software development. Nobody(1) actually knows what causes LLMs to do certain things, we just pray the prompt moves the probabilities the right way enough such that it mostly does what we want. This used to be a field that prided itself on deterministic behavior and reproducibility.

      Now? We have AGENTS.md files that look like a parent talking to a child with all the bold all-caps, double emphasis, just praying that's enough to be sure they run the commands you want them to be running

      (1 Outside of some core ML developers at the big model companies)

      • harrall14 minutes ago
        It’s like playing a fretless instrument to me.

        Practice playing songs by ear and after 2 weeks, my brain has developed an inference model of where my fingers should go to hit any given pitch.

        Do I have any idea how my brain’s model works? No! But it tickles a different part of my brain and I like it.

      • 2 hours ago
        undefined
      • chickensong2 hours ago
        For Claude at least, the more recent guidance from Anthropic is to not yell at it. Just clear, calm, and concise instructions.
        • joshmnan hour ago
          Sometimes I daydream about people screaming at their LLM as if it was a TV they were playing video games on.
        • truenoan hour ago
          wait seriously? lmfao

          thats hilarious. i definitely treat claude like shit and ive noticed the falloff in results.

          if there's a source for that i'd love to read about it.

          • xmcp12323 minutes ago
            For awhile(maybe a year ago?) it seemed like verbal abuse was the best way to make Claude pay attention. In my head, it was impacting how important it deemed the instruction. And it definitely did seem that way.
          • defrostan hour ago
            Consciousness is off the table but they absolutely respond to environmental stimulus and vibes.

            See, uhhh, https://pmc.ncbi.nlm.nih.gov/articles/PMC8052213/ and maybe haave a shot at running claude while playing Enya albums on loop.

            /s (??)

            • trueno3 minutes ago
              i have like the faintest vague thread of "maybe this actually checks out" in a way that has shit all to do with consciousness

              sometimes internet arguments get messy, people die in their hills and double / triple down on internet message boards. since historic internet data composes a bit of what goes into an llm, would it make sense that bad-juju prompting sends it to some dark corners of its training model if implementations don't properly sanitize certain negative words/phrases ?

    • Betelbuddyan hour ago
      Its very logical and pretty obvious when you do code generation. If you ask the same model, to generate code by starting with:

      - You are a Python Developer... or - You are a Professional Python Developer... or - You are one of the World most renowned Python Experts, with several books written on the subject, and 15 years of experience in creating highly reliable production quality code...

      You will notice a clear improvement in the quality of the generated artifacts.

    • winwang20 minutes ago
      Apparently LLM quality is sensitive to emotional stimuli?

      "Large Language Models Understand and Can be Enhanced by Emotional Stimuli": https://arxiv.org/abs/2307.11760

    • hashmap2 hours ago
      these sort-of-lies might help:

      think of the latent space inside the model like a topological map, and when you give it a prompt, you're dropping a ball at a certain point above the ground, and gravity pulls it along the surface until it settles.

      caveat though, thats nice per-token, but the signal gets messed up by picking a token from a distribution, so each token you're regenerating and re-distorting the signal. leaning on language that places that ball deep in a region that you want to be makes it less likely that those distortions will kick it out of the basin or valley you may want to end up in.

      if the response you get is 1000 tokens long, the initial trajectory needed to survive 1000 probabilistic filters to get there.

      or maybe none of that is right lol but thinking that it is has worked for me, which has been good enough

    • giancarlostoro26 minutes ago
      The LLM will do what you ask it to unless you don't get nuanced about it. Myself and others have noticed that LLM's work better when your codebase is not full of code smells like massive godclass files, if your codebase is discrete and broken up in a way that makes sense, and fits in your head, it will fit in the models head.
    • scuff3d31 minutes ago
      How anybody can read stuff like this and still take all this seriously is beyond me. This is becoming the engineering equivalent of astrology.
      • fragmede23 minutes ago
        Feel free to run your own tests and see if the magic phrases do or do not influence the output. Have it make a Todo webapp with and without those phrases and see what happens!
    • ambicapter35 minutes ago
      Maybe the training data that included the words like "skim" also provided shallower analysis than training that was close to the words "in great detail", so the LLM is just reproducing those respective words distribution when prompted with directions to do either.
    • ChadNauseam2 hours ago
      The disconnect might be that there is a separation between "generating the final answer for the user" and "researching/thinking to get information needed for that answer". Saying "deeply" prompts it to read more of the file (as in, actually use the `read` tool to grab more parts of the file into context), and generate more "thinking" tokens (as in, tokens that are not shown to the user but that the model writes to refine its thoughts and improve the quality of its answer).
    • stingraycharles2 hours ago
      It’s actually really common. If you look at Claude Code’s own system prompts written by Anthropic, they’re littered with “CRITICAL (RULE 0):” type of statements, and other similar prompting styles.
    • wilkystyle2 hours ago
      The author is referring to how the framing of your prompt informs the attention mechanism. You are essentially hinting to the attention mechanism that the function's implementation details have important context as well.
    • fragmede2 hours ago
      Yeah, it's definitely a strange new world we're in, where I have to "trick" the computer into cooperating. The other day I told Claude "Yes you can", and it went off and did something it just said it couldn't do!
      • itypecode2 hours ago
        Solid dad move. XD
        • wilkystyle2 hours ago
          Is parenting making us better at prompt engineering, or is it the other way around?
      • bpodgursky2 hours ago
        You bumped the token predictor into the latent space where it knew what it was doing : )
    • MattGaiser2 hours ago
      One of the well defined failure modes for AI agents/models is "laziness." Yes, models can be "lazy" and that is an actual term used when reviewing them.

      I am not sure if we know why really, but they are that way and you need to explicitly prompt around it.

      • kannanvijayanan hour ago
        I've encountered this failure mode, and the opposite of it: thinking too much. A behaviour I've come to see as some sort of pseudo-neuroticism.

        Lazy thinking makes LLMs do surface analysis and then produce things that are wrong. Neurotic thinking will see them over-analyze, and then repeatedly second-guess themselves, repeatedly re-derive conclusions.

        Something very similar to an anxiety loop in humans, where problems without solutions are obsessed about in circles.

        • denimnerd4244 minutes ago
          yeah i experienced this the other day when asking claude code to build an http proxy using an afsk modem software to communicate over the computers sound card. it had an absolute fit tuning the system and would loop for hours trying and doubling back. eventually after some change in prompt direction to think more deeply and test more comprehensively it figured it out. i certainly had no idea how to build a afsk modem.
    • popalchemistan hour ago
      Strings of tokens are vectors. Vectors are directions. When you use a phrase like that you are orienting the vector of the overall prompt toward the direction of depth, in its map of conceptual space.
    • 2 hours ago
      undefined
  • dennisjoseph3 minutes ago
    The annotation cycle is the key insight for me. Treating the plan as a living doc you iterate on before touching any code makes a huge difference in output quality.

    Experimentally, i've been using mfbt.ai [https://mfbt.ai] for roughly the same thing in a team context. it lets you collaboratively nail down the spec with AI before handing off to a coding agent via MCP.

    Avoids the "everyone has a slightly different plan.md on their machine" problem. Still early days but it's been a nice fit for this kind of workflow.

  • red_harean hour ago
    I use Claude Code for lecture prep.

    I craft a detailed and ordered set of lecture notes in a Quarto file and then have a dedicated claude code skill for translating those notes into Slidev slides, in the style that I like.

    Once that's done, much like the author, I go through the slides and make commented annotations like "this should be broken into two slides" or "this should be a side-by-side" or "use your generate clipart skill to throw an image here alongside these bullets" and "pull in the code example from ../examples/foo." It works brilliantly.

    And then I do one final pass of tweaking after that's done.

    But yeah, annotations are super powerful. Token distance in-context and all that jazz.

    • saxelsenan hour ago
      Can I ask how you annotate the feedback for it? Just with inline comments like `# This should be changed to X`?

      The author mentions annotations but doesn't go into detail about how to feed the annotations to Claude.

      • red_hare24 minutes ago
        Slidev is markdown, so i do it in html comments. Usually something like:

            <!-- TODOCLAUDE: Split this into a two-cols-title, divide the examples between -->
        
        or

            <!-- TODOCLAUDE: Use clipart skill to make an image for this slide -->
        
        And then, when I finish annotating I just say: "Address all the TODOCLAUDEs"
    • ramozan hour ago
      is your skill open source
      • red_hare14 minutes ago
        Not yet... but also I'm not sure it makes a lot of sense to be open source. It's super specific to how I like to build slide decks and to my personal lecture style.

        But it's not hard to build one. The key for me was describing, in great detail:

        1. How I want it to read the source material (e.g., H1 means new section, H2 means at least one slide, a link to an example means I want code in the slide)

        2. How to connect material to layouts (e.g., "comparison between two ideas should be a two-cols-title," "walkthrough of code should be two-cols with code on right," "learning objectives should be side-title align:left," "recall should be side-title align:right")

        Then the workflow is:

        1. Give all those details and have it do a first pass.

        2. Give tons of feedback.

        3. At the end of the session, ask it to "make a skill." 4. Manually edit the skill so that you're happy with the examples.

  • brandall102 hours ago
    I go a bit further than this and have had great success with 3 doc types and 2 skills:

    - Specs: these are generally static, but updatable as the project evolves. And they're broken out to an index file that gives a project overview, a high-level arch file, and files for all the main modules. Roughly ~1k lines of spec for 10k lines of code, and try to limit any particular spec file to 300 lines. I'm intimately familiar with every single line in these.

    - Plans: these are the output of a planning session with an LLM. They point to the associated specs. These tend to be 100-300 lines and 3 to 5 phases.

    - Working memory files: I use both a status.md (3-5 items per phase roughly 30 lines overall), which points to a latest plan, and a project_status (100-200 lines), which tracks the current state of the project and is instructed to compact past efforts to keep it lean)

    - A planner skill I use w/ Gemini Pro to generate new plans. It essentially explains the specs/plans dichotomy, the role of the status files, and to review everything in the pertinent areas of code and give me a handful of high-level next set of features to address based on shortfalls in the specs or things noted in the project_status file. Based on what it presents, I select a feature or improvement to generate. Then it proceeds to generate a plan, updates a clean status.md that points to the plan, and adjusts project_status based on the state of the prior completed plan.

    - An implementer skill in Codex that goes to town on a plan file. It's fairly simple, it just looks at status.md, which points to the plan, and of course the plan points to the relevant specs so it loads up context pretty efficiently.

    I've tried the two main spec generation libraries, which were way overblown, and then I gave superpowers a shot... which was fine, but still too much. The above is all homegrown, and I've had much better success because it keeps the context lean and focused.

    And I'm only on the $20 plans for Codex/Gemini vs. spending $100/month on CC for half year prior and move quicker w/ no stall outs due to token consumption, which was regularly happening w/ CC by the 5th day. Codex rarely dips below 70% available context when it puts up a PR after an execution run. Roughly 4/5 PRs are without issue, which is flipped against what I experienced with CC and only using planning mode.

    • jcurbo39 minutes ago
      This is pretty much my approach. I started with some spec files for a project I'm working on right now, based on some academic papers I've written. I ended up going back and forth with Claude, building plans, pushing info back into the specs, expanding that out and I ended up with multiple spec/architecture/module documents. I got to the point where I ended up building my own system (using claude) to capture and generate artifacts, in more of a systems engineering style (e.g. following IEEE standards for conops, requirement documents, software definitions, test plans...). I don't use that for session-level planning; Claude's tools work fine for that. (I like superpowers, so far. It hasn't seemed too much)

      I have found it to work very well with Claude by giving it context and guardrails. Basically I just tell it "follow the guidance docs" and it does. Couple that with intense testing and self-feedback mechanisms and you can easily keep Claude on track.

      I have had the same experience with Codex and Claude as you in terms of token usage. But I haven't been happy with my Codex usage; Claude just feels like it's doing more of what I want in the way I want.

    • r12902 hours ago
      Looks good. Question - is it always better to use a monorepo in this new AI world? Vs breaking your app into separate repos? At my company we have like 6 repos all separate nextjs apps for the same user base. Trying to consolidate to one as it should make life easier overall.
      • throwup2382 hours ago
        It really depends but there’s nothing stopping you from just creating a separate folder with the cloned repositories (or worktrees) that you need and having a root CLAUDE.md file that explains the directory structure and referencing the individual repo CLAUDE.md files.
      • oa3352 hours ago
        Just put all the repos in all in one directory yourself. In my experience that works pretty well.
      • chickensongan hour ago
        AI is happy to work with any directory you tell it to. Agent files can be applied anywhere.
  • jamesmcq2 hours ago
    This all looks fine for someone who can't code, but for anyone with even a moderate amount of experience as a developer all this planning and checking and prompting and orchestrating is far more work than just writing the code yourself.

    There's no winner for "least amount of code written regardless of productivity outcomes.", except for maybe Anthropic's bank account.

    • shepherdjerred2 hours ago
      I really don't understand why there are so many comments like this.

      Yesterday I had Claude write an audit logging feature to track all changes made to entities in my app. Yeah you get this for free with many frameworks, but my company's custom setup doesn't have it.

      It took maybe 5-10 minutes of wall-time to come up with a good plan, and then ~20-30 min for Claude implement, test, etc.

      That would've taken me at least a day, maybe two. I had 4-5 other tasks going on in other tabs while I waited the 20-30 min for Claude to generate the feature.

      After Claude generated, I needed to manually test that it worked, and it did. I then needed to review the code before making a PR. In all, maybe 30-45 minutes of my actual time to add a small feature.

      All I can really say is... are you sure you're using it right? Have you _really_ invested time into learning how to use AI tools?

      • tyleo2 hours ago
        Same here. I did bounce off these tools a year ago. They just didn't work for me 60% of the time. I learned a bit in that initial experience though and walked away with some tasks ChatGPT could replace in my workflow. Mainly replacing scripts and reviewing single files or functions.

        Fast forward to today and I tried the tools again--specifically Claude Code--about a week ago. I'm blown away. I've reproduced some tools that took me weeks at full-time roles in a single day. This is while reviewing every line of code. The output is more or less what I'd be writing as a principal engineer.

      • jamesmcq2 hours ago
        Trust me I'm very impressed at the progress AI has made, and maybe we'll get to the point where everything is 100% correct all the time and better than any human could write. I'm skeptical we can get there with the LLM approach though.

        The problem is LLMs are great at simple implementation, even large amounts of simple implementation, but I've never seen it develop something more than trivial correctly. The larger problem is it's very often subtly but hugely wrong. It makes bad architecture decisions, it breaks things in pursuit of fixing or implementing other things. You can tell it has no concept of the "right" way to implement something. It very obviously lacks the "senior developer insight".

        Maybe you can resolve some of these with large amounts of planning or specs, but that's the point of my original comment - at what point is it easier/faster/better to just write the code yourself? You don't get a prize for writing the least amount of code when you're just writing specs instead.

        • fourthark2 hours ago
          This is exactly what the article is about. The tradeoff is that you have to throughly review the plans and iterate on them, which is tiring. But the LLM will write good code faster than you, if you tell it what good code is.
          • reg_dunlopan hour ago
            Exactly; the original commenter seems determined to write-off AI as "just not as good as me".

            The original article is, to me, seemingly not that novel. Not because it's a trite example, but because I've begun to experience massive gains from following the same basic premise as the article. And I can't believe there's others who aren't using like this.

            I iterate the plan until it's seemingly deterministic, then I strip the plan of implementation, and re-write it following a TDD approach. Then I read all specs, and generate all the code to red->green the tests.

            If this commenter is too good for that, then it's that attitude that'll keep him stuck. I already feel like my projects backlog is achievable, this year.

            • fourthark22 minutes ago
              Strongly agree about the deterministic part. Even more important than a good design, the plan must not show any doubt, whether it's in the form of open questions or weasel words. 95% of the time those vague words mean I didn't think something through, and it will do something hideous in order to make the plan work
        • nojito2 hours ago
          >I've never seen it develop something more than trivial correctly.

          This is 100% incorrect, but the real issue is that the people who are using these llms for non-trivial work tend to be extremely secretive about it.

          For example, I view my use of LLMs to be a competitive advantage and I will hold on to this for as long as possible.

          • jamesmcq2 hours ago
            The key part of my comment is "correctly".

            Does it write maintainable code? Does it write extensible code? Does it write secure code? Does it write performant code?

            My experience has been it failing most of these. The code might "work", but it's not good for anything more than trivial, well defined functions (that probably appeared in it's training data written by humans). LLMs have a fundamental lack of understanding of what they're doing, and it's obvious when you look at the finer points of the outcomes.

            That said, I'm sure you could write detailed enough specs and provide enough examples to resolve these issues, but that's the point of my original comment - if you're just writing specs instead of code you're not gaining anything.

            • cowlby2 hours ago
              I find “maintainable code” the hardest bias to let go of. 15+ years of coding and design patterns are hard to let go.

              But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms. So maintainable has to evolve from good human design patterns to good AI patterns.

              Specs are worth it IMO. Not because if I can spec, I could’ve coded anyway. But because I gain all the insight and capabilities of AI, while minimizing the gotchas and edge failures.

            • reg_dunlopan hour ago
              To answer all of your questions:

              yes, if I steer it properly.

              It's very good at spotting design patterns, and implementing them. It doesn't always know where or how to implement them, but that's my job.

              The specs and syntactic sugar are just nice quality of life benefits.

            • jmathai2 hours ago
              You’d be building blocks which compound over time. That’s been my experience anyway.

              The compounding is much greater than my brain can do on its own.

      • skydhashan hour ago
        > Yesterday I had Claude write an audit logging feature to track all changes made to entities in my app. Yeah you get this for free with many frameworks, but my company's custom setup doesn't have it.

        But did you truly think about such feature? Like guarantees that it should follow (like how do it should cope with entities migration like adding a new field) or what the cost of maintaining it further down the line. This looks suspiciously like drive-by PR made on open-source projects.

        > That would've taken me at least a day, maybe two.

        I think those two days would have been filled with research, comparing alternatives, questions like "can we extract this feature from framework X?", discussing ownership and sharing knowledge,.. Jumping on coding was done before LLMs, but it usually hurts the long term viability of the project.

        Adding code to a project can be done quite fast (hackatons,...), ensuring quality is what slows things down in any any well functioning team.

      • streetfighter642 hours ago
        I mean, all I can really say is... if writing some logging takes you one or two days, are you sure you _really_ know how to code?
        • boxedemp2 hours ago
          Ever worked on a distributed system with hundreds of millions of customers and seemingly endless business requirements?

          Some things are complex.

        • shepherdjerred2 hours ago
          You're right, you're better than me!

          You could've been curious and ask why it would take 1-2 days, and I would've happily told you.

          • jamesmcq2 hours ago
            I'll bite, because it does seem like something that should be quick in a well-architected codebase. What was the situation? Was there something in this codebase that was especially suited to AI-development? Large amounts of duplication perhaps?
            • shepherdjerred2 hours ago
              It's not particularly interesting.

              I wanted to add audit logging for all endpoints we call, all places we call the DB, etc. across areas I haven't touched before. It would have taken me a while to track down all of the touchpoints.

              Granted, I am not 100% certain that Claude didn't miss anything. I feel fairly confident that it is correct given that I had it research upfront, had multiple agents review, and it made the correct changes in the areas that I knew.

              Also I'm realizing I didn't mention it included an API + UI for viewing events w/ pretty deltas

        • fragmede2 hours ago
          We're not as good at coding as you, naturally.
    • roncesvalles11 minutes ago
      Well it's less mental load. It's like Tesla's FSD. Am I a better driver than the FSD? For sure.

      But is it nice to just sit back and let it drive for a bit even if it's suboptimal and gets me there 10% slower and maybe slightly pisses off the guy behind me? Yes, nice enough to shell out $99/mo. Code implementation takes a toll on you in the same way that driving does.

      Personally I don't care if it's slower or faster because I don't sign my own paychecks.

    • skeledrew2 hours ago
      Researching and planning a project is a generally usefully thing. This is something I've been doing for years, and have always had great results compared to just jumping in and coding. It makes perfect sense that this transfers to LLM use.
    • kburman2 hours ago
      Since Opus 4.5, things have changed quite a lot. I find LLMs very useful for discussing new features or ideas, and Sonnet is great for executing your plan while you grab a coffee.
    • phantomathkgan hour ago
      Surely Addy Osmani can code. Even he suggests plan first.

      https://news.ycombinator.com/item?id=46489061

    • dmix2 hours ago
      Most of these AI coding articles seem to be about greenfield development.

      That said, if you're on a serious team writing professional software there is still tons of value in always telling AI to plan first, unless it's a small quick task. This post just takes it a few steps further and formalizes it.

      I find Cursor works much more reliably using plan mode, reviewing/revising output in markdown, then pressing build. Which isn't a ton of overhead but often leads to lots of context switching as it definitely adds more time.

    • keyle2 hours ago
      I partly agree with you. But once you have a codebase large enough, the changes become longer to even type in, once figured out.

      I find the best way to use agents (and I don't use claude) is to hash it out like I'm about to write these changes and I make my own mental notes, and get the agent to execute on it.

      Agents don't get tired, they don't start fat fingering stuff at 4pm, the quality doesn't suffer. And they can be parallelised.

      Finally, this allows me to stay at a higher level and not get bogged down of "right oh did we do this simple thing again?" which wipes some of the context in my mind and gets tiring through the day.

      Always, 100% review every line of code written by an agent though. I do not condone committing code you don't 'own'.

      I'll never agree with a job that forces developers to use 'AI', I sometimes like to write everything by hand. But having this tool available is also very powerful.

      • jamesmcq2 hours ago
        I want to be clear, I'm not against any use of AI. It's hugely useful to save a couple of minutes of "write this specific function to do this specific thing that I could write and know exactly what it would look like". That's a great use, and I use it all the time! It's better autocomplete. Anything beyond that is pushing it - at the moment! We'll see, but spending all day writing specs and double-checking AI output is not more productive than just writing correct code yourself the first time, even if you're AI-autocompleting some of it.
        • skeledrewan hour ago
          For the last few days I've been working on a personal project that's been on ice for at least 6 years. Back when I first thought of the project and started implementing it, it took maybe a couple weeks to eke out some minimally working code.

          This new version that I'm doing (from scratch with ChatGPT web) has a far more ambitious scope and is already at the "usable" point. Now I'm primarily solidifying things and increasing test coverage. And I've tested the key parts with IRL scenarios to validate that it's not just passing tests; the thing actually fulfills its intended function so far. Given the increased scope, I'm guessing it'd take me a few months to get to this point on my own, instead of under a week, and the quality wouldn't be where it is. Not saying I haven't had to wrangle with ChatGPT on a few bugs, but after a decent initial planning phase, my prompts now are primarily "Do it"s and "Continue"s. Would've likely already finished it if I wasn't copying things back and forth between browser and editor, and being forced to pause when I hit the message limit.

          • keylean hour ago
            This is a great come-back story. I have had a similar experience with a photoshop demake of mine.

            I recommend to try out Opencode with this approach, you might find it less tiring than ChatGPT web (yes it works with your ChatGPT Plus sub).

    • skydhashan hour ago
      > planning and checking and prompting and orchestrating is far more work than just writing the code yourself.

      This! Once I'm familiar with the codebase (which I strive to do very quickly), for most tickets, I usually have a plan by the time I've read the description. I can have a couple of implementation questions, but I knew where the info is located in the codebase. For things, I only have a vague idea, the whiteboard is where I go.

      The nice thing with such a mental plan, you can start with a rougher version (like a drawing sketch). Like if I'm starting a new UI screen, I can put a placeholder text like "Hello, world", then work on navigation. Once that done, I can start to pull data, then I add mapping functions to have a view model,...

      Each step is a verifiable milestone. Describing them is more mentally taxing than just writing the code (which is a flow state for me). Why? Because English is not fit to describe how computer works (try describe a finite state machine like navigation flow in natural languages). My mental mental model is already aligned to code, writing the solution in natural language is asking me to be ambiguous and unclear on purpose.

  • zitrusfrucht3 hours ago
    I do something very similar, also with Claude and Codex, because the workflow is controlled by me, not by the tool. But instead of plan.md I use a ticket system basically like ticket_<number>_<slug>.md where I let the agent create the ticket from a chat, correct and annotate it afterwards and send it back, sometimes to a new agent instance. This workflow helps me keeping track of what has been done over time in the projects I work on. Also this approach does not need any „real“ ticket system tooling/mcp/skill/whatever since it works purely on text files.
    • ramozan hour ago
      Try /annotate with https://github.com/backnotprop/plannotator

      While it hooks directly into plan mode, the annotate command works for any custom markdown approach - annotate in a nice visual, and automatically send the feedback to the agent

    • gbnwl2 hours ago
      +1 to creating tickets by simply asking the agent to. It's worked great and larger tasks can be broken down into smaller subtasks that could reasonably be completed in a single context window, so you rarely every have to deal with compaction. Especially in the last few months since Claude's gotten good at dispatching agents to handle tasks if you ask it to, I can plan large changes that span multilpe tickets and tell claude to dispatch agents as needed to handle them (which it will do in parallel if they mostly touch different files), keeping the main chat relatively clean for orchestration and validation work.
  • RHSeeger2 hours ago
    > Most developers type a prompt, sometimes use plan mode, fix the errors, repeat.

    > ...

    > never let Claude write code until you’ve reviewed and approved a written plan

    I certainly always work towards an approved plan before I let it lost on changing the code. I just assumed most people did, honestly. Admittedly, sometimes there's "phases" to the implementation (because some parts can be figured out later and it's more important to get the key parts up and running first), but each phase gets a full, reviewed plan before I tell it to go.

    In fact, I just finished writing a command and instruction to tell claude that, when it presents a plan for implementation, offer me another option; to write out the current (important parts of the) context and the full plan to individual (ticket specific) md files. That way, if something goes wrong with the implementation I can tell it to read those files and "start from where they left off" in the planning.

    • ramozan hour ago
      The author seems to think theyve invented a special workflow...

      We all tend to regress to average (same thoughts/workflows)...

      Have had many users already doing the exact same workflow with: https://github.com/backnotprop/plannotator

      • CGamesPlayan hour ago
        4 times in one thread, please stop spamming this link.
  • cowlby2 hours ago
    I recently discovered GitHub speckit which separates planning/execution in stages: specify, plan, tasks, implement. Finding it aligns with the OP with the level of “focus” and “attention” this gets out of Claude Code.

    Speckit is worth trying as it automates what is being described here, and with Opus 4.6 it's been a kind of BC/AD moment for me.

  • deevus2 hours ago
    This is what I do with the obra/superpowers[0] set of skills.

    1. Use brainstorming to come up with the plan using the Socratic method

    2. Write a high level design plan to file

    3. I review the design plan

    4. Write an implementation plan to file. We've already discussed this in detail, so usually it just needs skimming.

    5. Use the worktree skill with subagent driven development skill

    6. Agent does the work using subagents that for each task:

      a. Implements the task
    
      b. Spec reviews the completed task
    
      c. Code reviews the completed task
    
    7. When all tasks complete: create a PR for me to review

    8. Go back to the agent with any comments

    9. If finished, delete the plan files and merge the PR

    [0]: https://github.com/obra/superpowers

    • ramoz2 hours ago
      If you’ve ever desired the ability for annotating the plan more visually, try fitting Plannotator in this workflow. There is a slash command for use when you use custom workflows outside of normal plan mode.

      https://github.com/backnotprop/plannotator

      • deevus2 hours ago
        I'll give this a try. Thanks for the suggestion.
  • Ozzie_osmanan hour ago
    There are a few prompt frameworks that essentially codify these types of workflows by adding skills and prompts

    https://github.com/obra/superpowers https://github.com/jlevy/tbd

  • beratbozkurt017 minutes ago
    That's great, actually, doesn't the logic apply to other services as well?
  • srid3 hours ago
    Regarding inline notes, I use a specific format in the `/plan` command, by using th `ME:` prefix.

    https://github.com/srid/AI/blob/master/commands/plan.md#2-pl...

    It works very similar to Antigravity's plan document comment-refine cycle.

    https://antigravity.google/docs/implementation-plan

  • dworks27 minutes ago
    my rlm-workflow skill has this encoded as a repeatable workflow.

    give it a try: https://skills.sh/doubleuuser/rlm-workflow/rlm-workflow

  • 2 hours ago
    undefined
  • recroad2 hours ago
    Try OpenSpec and it'll do all this for you. SpecKit works too. I don't think there's a need to reinvent the wheel on this one, as this is spec-driven development.
  • jrs235an hour ago
    Claude appeared to just crash in my session: https://news.ycombinator.com/item?id=47107630
  • zhubertan hour ago
    AI only improves and changes. Embrace the scientific method and make sure your “here’s how to” are based in data.
  • 2 hours ago
    undefined
  • h14han hour ago
    Is this not just Ralph with extra steps and the risk of context rot?
  • skybrian2 hours ago
    I do something broadly similar. I ask for a design doc that contains an embedded todo list, broken down into phases. Looping on the design doc asking for suggestions seems to help. I'm up to about 40 design docs so far on my current project.
  • imron2 hours ago
    I have tried using this and other workflows for a long time and had never been able to get them to work (see chat history for details).

    This has changed in the last week, for 3 reasons:

    1. Claude opus. It’s the first model where I haven’t had to spend more time correcting things than it would’ve taken me to just do it myself. The problem is that opus chews through tokens, which led to..

    2. I upgraded my Claude plan. Previously on the regular plan I’d get about 20 mins of time before running out of tokens for the session and then needing to wait a few hours to use again. It was fine for little scripts or toy apps but not feasible for the regular dev work I do. So I upgraded to 5x. This now got me 1-2 hours per session before tokens expired. Which was better but still a frustration. Wincing at the price, I upgraded again to the 20x plan and this was the next game changer. I had plenty of spare tokens per session and at that price it felt like they were being wasted - so I ramped up my usage. Following a similar process as OP but with a plans directory with subdirectories for backlog, active and complete plans, and skills with strict rules for planning, implementing and completing plans, I now have 5-6 projects on the go. While I’m planning a feature on one the others are implementing. The strict plans and controls keep them on track and I have follow up skills for auditing quality and performance. I still haven’t hit token limits for a session but I’ve almost hit my token limit for the week so I feel like I’m getting my money’s worth. In that sense spending more has forced me to figure out how to use more.

    3. The final piece of the puzzle is using opencode over claude code. I’m not sure why but I just don’t gel with Claude code. Maybe it’s all the sautéing and flibertygibbering, maybe it’s all the permission asking, maybe it’s that it doesn’t show what it’s doing as much as opencode. Whatever it is it just doesn’t work well for me. Opencode on the other hand is great. It’s shows what it’s doing and how it’s thinking which makes it easy for me to spot when it’s going off track and correct early.

    Having a detailed plan, and correcting and iterating on the plan is essential. Making clause follow the plan is also essential - but there’s a line. Too fine grained and it’s not as creative at solving problems. Too loose/high level and it makes bad choices and goes in the wrong direction.

    Is it actually making me more productive? I think it is but I’m only a week in. I’ve decided to give myself a month to see how it all works out.

    I don’t intend to keep paying for the 20x plan unless I can see a path to using it to earn me at least as much back.

    • raw_anon_11112 hours ago
      Just don’t use Claude Code. I can use the Codex CLI with just my $20 subscription and never come close to any usage limits
      • throwawayteaan hour ago
        What if it's just slower so that your daily work fits within the paid tier they want?
        • raw_anon_1111an hour ago
          It isn’t slower. I use my personal ChatGPT subscriptions with Codex for almost everything at work and use my $800/month company Claude allowance only for the tricky stuff that Codex can’t figure out. It’s never application code. It’s usually some combination of app code + Docker + AWS issue with my underlying infrastructure - created with whatever IAC that I’m using for a client - Terraform/CloudFormation or the CDK.

          I burned through $10 on Claude in less than an hour. I only have $36 a day at $800 a month (800/22 working days)

          • imronan hour ago
            > and use my $800/month company Claude allowance only for the tricky stuff that Codex can’t figure out.

            It doesn’t seem controversial that the model that can solve more complex problems (that you admit the cheaper model can’t solve) costs more.

            For the things I use it for, I’ve not found any other model to be worth it.

            • raw_anon_111130 minutes ago
              You’re assuming rational behavior from a company that doesn’t care about losing billions of dollar.

              Have you tried Codex with OpenAi’s latest models?

  • alexmorgan262 hours ago
    This separation of planning and execution resonates deeply with how I approach task management in general, not just coding.

    The key insight here - that planning and execution should be distinct phases - applies to productivity tools too. I've been using www.dozy.site which takes a similar philosophy: it has smart calendar scheduling that automatically fills your empty time slots with planned tasks. The planning happens first (you define your tasks and projects), then the execution is automated (tasks get scheduled into your calendar gaps).

    The parallel is interesting: just like you don't want Claude writing code before the plan is solid, you don't want to manually schedule tasks before you've properly planned what needs to be done. The separation prevents wasted effort and context switching.

    The annotation cycle you describe (plan -> review -> annotate -> refine) is exactly how I work with my task lists too. Define the work, review it, adjust priorities and dependencies, then let the system handle the scheduling.

    • dimgl2 hours ago
      Pretty sure this entire comment is AI generated.
      • rob2 hours ago
        Almost think we're at the point on HN where we need a special [flag bot] link for those that meet a certain threshold and it alerts @dang or something to investigate them in more detail. The amount of bots on here has been increasing at an alarming rate.
  • bodeadly2 hours ago
    Tip: LLMs are very good at following conventions (this is actually what is happening when it writes code). If you create a .md file with a list of entries of the following structure: # <identifier> <description block> <blank space> # <identifier> ... where an <identifier> is a stable and concise sequence of tokens that identifies some "thing" and seed it with 5 entries describing abstract stuff, the LLM will latch on and reference this. I call this a PCL (Project Concept List). I just tell it: > consume tmp/pcl-init.md pcl.md The pcl-init.md describes what PCL is and pcl.md is the actual list. I have pcl.md file for each independent component in the code (logging, http, auth, etc). This works very very well. The LLM seems to "know" what you're talking about. You can ask questions and give instructions like "add a PCL entry about this". It will ask if should add a PCL entry about xyz. If the description block tends to be high information-to-token ratio, it will follow that convention (which is a very good convention BTW).

    However, there is a caveat. LLMs resist ambiguity about authority. So the "PCL" or whatever you want to call it, needs to be the ONE authoritative place for everything. If you have the same stuff in 3 different files, it won't work nearly as well.

    Bonus Tip: I find long prompt input with example code fragments and thoughtful descriptions work best at getting an LLM to produce good output. But there will always be holes (resource leaks, vulnerabilities, concurrency flaws, etc). So then I update my original prompt input (keep it in a separate file PROMPT.txt as a scratch pad) to add context about those things maybe asking questions along the way to figure out how to fix the holes. Then I /rewind back to the prompt and re-enter the updated prompt. This feedback loop advances the conversation without expending tokens.

  • recroad2 hours ago
    Use OpenSpec and simplify everything.
  • politician43 minutes ago
    Wow, I never bother with using phrases like “deeply study this codebase deeply.” I consistently get pretty fantastic results.
  • bandramian hour ago
    How much time are you actually saving at this point?
  • renewiltord3 hours ago
    The plan document and todo are an artifact of context size limits. I use them too because it allows using /reset and then continuing.
  • fnord772 hours ago
    I have a different approach where I have claude write coding prompts for stages then I give the prompt to another agent. I wonder if I should write it up as a blog post
  • paperclipmaxian hour ago
    [dead]
  • ihsw3 hours ago
    [dead]
  • hilliardfarmer3 hours ago
    [flagged]
    • crazygringo3 hours ago
      Please don't be knee-jerk dismissive of posts. Absolute nothing about this article looks "LLM-generated style" to me.
  • ramoz2 hours ago
    One thing for me has been the ability to iterate over plans - with a better visual of them as well as ability to annotate feedback about the plan.

    https://github.com/backnotprop/plannotator Plannotator does this really effectively and natively through hooks

    • prodtorok2 hours ago
      Wow, I've been needing this! The one issue I’ve had with terminals is reviewing plans, and desiring the ability to provide feedback on specific plan sections in a more organized way.

      Really nice ui based on the demo.

  • bluegatty12 minutes ago
    I don't see how this is 'radically different' given that Claude Code literally has a planning mode.

    This is my workflow as well, with the big caveat that 80% of 'work' doesn't require substantive planning, we're making relatively straight forward changes.

    • dack9 minutes ago
      last i checked, you can't annotate inline with planning mode. you have to type a lot to explain precisely what needs to change, and then it re-presents you with a plan (which may or may not have changed something else).

      i like the idea of having an actual document because you could actually compare the before and after versions if you wanted to confirm things changed as intended when you gave feedback