1 pointby samahlstrom8 hours ago1 comment
  • samahlstrom8 hours ago
    Some background: I had been running into the same issue over and over again where my Ai coding agents sucked at testing edge cases, performing long horizontal tasks, and testing the functionality of its own code. My agents, especially claude, would frequently hit context anxiety, run into issues where they stated they were "done" when in fact they had only hit 50% completion on a feature implementation, and then they would consistently lie to me and say, "Nuh uh, I did implement and test it".

    After doing some digging into other peoples approaches to avoid these problems I realized that an Ai harness was necessary to wrangle the clanker bastard in in order to perform my tasks big or small with increasing efficiency. I implemented a harness solution for my company where I work at and the results were good. Really good.

    Never before had I had so many of my PR's merged so quickly without being told "hey go check this out", or "this needs to change". It was incredible. It got to the point where I just gave claude unlimited access to my linear tasks from my project manager and had it run the request through the /forge skillset that is the core of the pipeline. I soon had no need to check in on how my little sweat shop coding agent was performing and finally had time to work on other stuff.

    With all the new time I had on my hands I realized that I wanted this not just in my work repo but in my personal ones as well so I created forge-cli. A cli tool that allows anyone anywhere with access to the repo to initialize an Agent harness that matches to an existing repo or helps you plan long horizontal tasks for a new project you are making, and sets up the core skills and agent files that are needed to start any good harness to reel in your defiant robot slave.

    Since every project is different the implementation should respect how your codebase and skills grew and what you already have and so the forge pipeline respects your new additions to SKILLS, CLAUDE.md, and more and formats the files it creates to match your repo.

    One of the standout additions of this forge-cli is implementing karpathy/autoresearch ideas. Basically a loop in the CLI called "forge refine" that helps you write out what you wanted a task to do, the implementation approach of that task, and then the refinement on if it completed or not. Only completions get merged into principle changes in the code to refine the process. You can apply this idea to skill files, workflows, and more.

    This means the more projects you tackle, the more iterations you run, the better your system gets over time. I experienced this first hand when running the forge CLI for the first time. It SUCKED to say the least but with this approach it now runs really cleanly and helped me refine my ideas and they will only be getting better. The main breakthrough is how this tool has allowed me to keep asking the question "what am I missing and what could be better?" without the massive mental research to answer those questions in a tight-ish loop.

    Please feel free to check out the repo, try it out for yourself, give me your critiques or praise on if it hurt or helped your process, and collaborate with me to jump in and make it better! This is my first time making something of this nature so if it is poorly made then I ask the great devs out there: I would love your feedback! Please also let me know your implementations on how you solved similar problems!