A Brief History of Ralph(www.humanlayer.dev)

73 pointsby dhorthy19 days ago11 comments

shanewwarren19 days ago
I've really jumped into this since I watched Geoffrey's videos last week. I ended up creating my own version of this, and have been throwing it small projects so far.
I created a small claude skill, that helps create the "specs" for a new/existing project, it adds a /specs folder with a README, that acts as a lookup for topics/features about the app, technical approach and feature set. Once we've chatted it spawns off subagents to do research and present those findings in the specific spec. In terms of improvements there, I'd almost like a more opinionated back and forth between "pm type" agents, to help test ideas and implementation ideas.
I've got the planning and build loop setup in the claude devcontainer, which is somewhat fragile at the moment, but works for now.
In terms of chewing up context, I've noticed that depending on the size of the project the "IMPLEMENTATION_PLAN.md" can get pretty massive. If each agent run needs to parse that plan to figure out what to do next it feels like a lot of wasted parsing. I'm working on changing that implementation plan to be more granular so there is less to parse when figuring out what to do next.
Overall, it's been fun and has kept me really engaged the past week.
vemv19 days ago
Ralph is, very literally, vibe coding with extra steps.
If you'll code a demo MVP, one-off idea, etc, alright, go ahead, have your fun and waste your tokens.
However I'll be happy when the (forced) hype fades off as people realise that there's nothing novel, insightful or even well-defined behind Ralph.
Loops have always been possible. Coordination frameworks (for tracking TODOs, spawning agents, supervising completion, etc) too, and would be better embodied as a program instead of as an ad-hoc prompt.
- realityfactchex19 days ago
  Yeah, Ralph smells like a fresh rebranding of YOLO.
  With YOLO on full-auto, you can give a wrapping rule/prompt that says more or less: "Given what I asked you to do as indicated in the TODO.md file, keep going until you are done, expanding and checking off the items, no matter what that means -- fix bugs, check work, expand the TODO. You are to complete the entire project correctly and fully yourself by looping and filling in what is missing or could be improved, until you find it is all completely done. Do not ask me anything, just do it with good judgement and iterating."
  Which is simultaneously:
  1. an effective way to spend tokens prodigiously 2. an excellent way to to get something working 90% of the way there with minimal effort, if you already set it up for success and the anticipatable outcomes are within acceptable parameters 3. a most excellent way to test how far fully autonomous development can go -- in particular, to test how the "rest of" one's configuration/scaffolding/setup is, for such "auto builds"
  Setting aside origin stories, honestly it's very hard to tell if Ralph and full-auto-YOLO before it are tightly coupled to some kind of "guerilla marketing" effort (or whatever that's called these days), or really are organic phenomen. It almost doesn't matter.
  The whole idea with auto-YOLO and Ralph seems to be you loop a lot and see what you can get. Very low effort, surprisingly good results. Just minor variations on branding and implementation.
  Either way, in my experience, auto-YOLO can actually work pretty well. 2025 proved to be cool in that regard.
jes519919 days ago
I forked the anthropic Ralph Wiggum plugin: https://github.com/jes5199/chief-wiggum
there’s some debate about whether this is in the spirit of the _original_ Ralph, because it keeps too much context history around. But in practice Claude Code compactions are so low-quality that it’s basically the same as clearing the history every few turns
I’ve had good luck giving it goals like “keep working until the integration test passes on GitHub CI” - that was my longest run, actually, it ran unattended for 24 hours before solving the bug
- senjin19 days ago
  The creator of claude code said you can just get ralph to run /clear. I think it's hilarious nobody (myself included!) thought of that or tried it and just assumed it couldn't run slash commands like that.
  https://x.com/bcherny/status/2012666979224629353
  - jes519919 days ago
    I asked Claude a few days ago and it said it didn’t have access to the /clear command? maybe it was wrong or maybe that has changed
Juvination19 days ago
I've been working with the Ralphosophy? for iterative behavior in my workflow and it seems pretty promising for cutting out a few manual steps.
I still have a manual part which is breaking the design document down into multiple small gh issues after a review but I think that is fine for now.
Using codex exec, we start working on a github issue with a supplied design document, creating a PR on completion. Then we perform a review using a review skill madeup which is effectively just a "cite your sources" skill on the review along with Open Questions.
Then we iterate through open questions doing a minimum of 3 reviews (somewhat arbitrary but sometimes multiple reviews catch things). Then finally I have I have a step in for checking Sonarcloud, fixing them and pushing the changes. Realistically this step should be broken out into multiple iterations to avoid large context rot.
What I miss the most is output, seeing whats going on in either Codex or Claude in real time. I can output the last response but it just gets messy until I make something a bit more formal.
skybrian19 days ago
There's a lot of irrelevant detail, but the article never actually explains what "Ralph" does or how it works.
- wild_egg19 days ago
  It's explained under the July 2025 heading with link to the blog post where it was first shared.
  The key bit is right under that though. Ralph is literally just this:
  while :; do cat PROMPT.md | npx --yes @sourcegraph/amp ; done
  - mkl19 days ago
    This is meaningless without knowing the contents of PROMPT.md. The blog post hides the contents of PROMPT.md in a second subscriber-only post (I don't know if the post's 36 minute video explains anything - 33% in it still hasn't).
    https://github.com/repomirrorhq/repomirror/blob/main/repomir... (discussed in https://news.ycombinator.com/item?id=45005434) provides a bit more detail, and prompts, but only seems to use the method for porting existing software.
  - skybrian19 days ago
    Thanks!
  - msla19 days ago
    Surely that would be better written as
    cat PROMPT.md | cat | npx --yes @sourcegraph/amp
    wild_egg19 days ago
    Aside from losing the loop (the whole point of the command), why the double `cat`?
    mkl19 days ago
    I think it was an attempt at a useless use of cat joke (cat isn't needed at all here, but IMHO helps readability).
    GibbonBreath19 days ago
    You've removed the loop. This pipeline executes once and then halts.
- 19 days ago
  undefined
- dhorthy19 days ago
  there are hundreds of useful resources, including many linked in the article itself
SafeDusk19 days ago
Sad that a lot of these are for Claude Code and not Codex which I uses more, so I started https://github.com/aperoc/codex-plus which has telemetry built-in, now moving to build a Ralph loop on top of it.
fallinditch19 days ago
Has anyone used this technique with other LLMs that are good at coding but not so expensive: for example Qwen 3 Coder?
- odie553319 days ago
  I did not find success with the Claude Code plugin. If the AI thinks things work, it will say COMPLETE even if you wouldn't think it's complete. It does not seem to work any harder than it did without the ralph loop. The structure the plugin recommended was too simplistic and I did not understand the true purpose of Ralph Loops.
  I think the key to it is having lots of smaller tasks with fresh context each loop. Ralph loop run starts, it picks the most important task, completes it, and ends its loop. Then the next ralph run starts with new context, grabs the most important task, and the loops continue. I have not tried this method yet.
ossa-ma19 days ago
So it took the author 6 months and several 1-to-1s with the creator to get value from this. As in he literally spent more time promoting it than he did using it.
And it all ends with the grift of all grifts: promoting a crypto token in a nonchalant 'hey whats this??!!??' way...
- dhorthy19 days ago
  the note about the crypto token was intended to “okay this is now hype slop and it’s time to move on”
f311a19 days ago
Just look at the code quality produced by these loops. That's all you need to know about it.
It's complete garbage, and since it runs in a loop, the amount of garbage multiplies over time.
- dhorthy19 days ago
  I don’t think anyone serious would recommend it for serious production systems. I respect the Ralph technique as a fascinating learning exercise in understanding llm context windows and how to squeeze more performance (read: quality) from today’s models
  Even if in the absolute the ceiling remains low, it’s interesting the degree to which good context engineering raises it
  - ossa-ma19 days ago
    How is it a “fascinating learning exercise” when the intention is to run the model in a closed loop with zero transparency. Running a black box in a black box to learn? What signals are you even listening to to determine whether your context engineering is good or whether the quality has improved aside from a brief glimpse at the final product. So essentially every time I want to test a prompt I waste $100 on Claude and have it an entire project for me?
    I’m all for AI and it’s evident that the future of AI is more transparency (MLOPs, tracing, mech interp, AI safety) not less.
    alansaber19 days ago
    Current transparency is rubbish but people will continue to put up with it if they're getting decent output quality
    dhorthy19 days ago
    there is the theoretical "how the world should be" and there is the practical "what's working today" - decry the latter and wait around for the former at your peril
- Veen19 days ago
  You probably wouldn't use it for anything serious, but I've Ralphed a couple of personal tools: Mac menu bar apps mostly. It works reasonably well so long as you do the prep upfront and prepare a decent spec and plan. No idea of the code quality because I wouldn't know good swift code from a hole in the head, but the apps work and scratch the itch that motivated them.
- skerit19 days ago
  I do not understand where this Ralph hype is coming from. Back when Claude 4.0 came out and it began to become actually useful, I already tried something like this. Every time it was a complete and utter failure.
  And this dream of "having Claude implement an entire project from start to finish without intervention" came crashing down with this realization: Coding assistants 100% need human guidance.
articulatepang19 days ago
This is so poorly written. What is "Ralph"? What is its purpose? How does it work? A single sentence at the top would help. The writer imagines that the reader cares enough to have followed their entire journey, or to decode this enormously distended pile of words.
More generally, I've noticed that people who spend a lot of time interacting with LLMs sometimes develop a distinct brain-fried tone when they write or talk.
- dang19 days ago
  Please don't post shallow dismissals of other people's work (this is in the site guidelines: https://news.ycombinator.com/newsguidelines.html) and especially please don't cross into personal attack.
  - articulatepang19 days ago
    Thanks dang, you’re right, I apologize and will keep this in mind in the future.
  - cactusplant737419 days ago
    How is it shallow? The commenter asked three questions. That shows that they read the article and are reacting to it. Shallow would be something like, "More AI slop."
    GibbonBreath19 days ago
    They allege these 3 questions aren't answered in the article and then use that as a jumping off point to further allege that using LLMs have damaged the writer's mind, but the article does address each one of their questions and they would've noticed that if their engagement hadn't been skin deep.
    So their comment is really a vehicle for them to deliver an insult and doesn't represent significant engagement with the material or a thoughtful digression that could foster curious conversation.
    Note that that doesn't mean it's a good article or that Ralph is a good idea.
    cactusplant737416 days ago
    The questions are the engagement. Then they added a conclusion. You may not like the conclusion but it's still valid discussion.
    GibbonBreath13 days ago
    It's engagement, but it's shallow engagement. Similarly; you may not like that conclusion, but you asked so I explained it.
- alansaber19 days ago
  "develop a distinct brain-fried tone when they write or talk" - I find that using an LLM as a writing copilot seriously degrades the flow of short form content
- linkregister19 days ago
  The answer to "what is Ralph?" is hyperlinked within the first sentence.
  - articulatepang18 days ago
    I actually visited that link, and the answer seems to be
    "If you've seen my socials lately, you might have seen me talking about Ralph and wondering what Ralph is. Ralph is a technique. In its purest form, Ralph is a Bash loop. while :; do cat PROMPT.md | claude-code ; done Ralph can replace the majority of outsourcing at most companies for greenfield projects. It has defects, but these are identifiable and resolvable through various styles of prompts."
    but the contents of PROMPT.md are behind a paywall. In spirit that is not so different from
    gcc program.c; ./a.out
    while program.c is behind a paywall. It's nearly impossible to reason about what the system will do and how it works without knowing more about PROMPT.md. For example, PROMPT.md could say "Build the software" or it could say "Write formal proofs in lean for each function" or ...
    In the spirit of curiosity, I'd appreciate a summary of a couple sentences describing the approach, aimed at a technically sophisticated audience that understands LLMs and software engineering, but not the specifics of this particular system.
    linkregister18 days ago
    That's reasonable
ahurmazda19 days ago
Anyone get a feeling that we are “inventing” a bunch of things that will be subsumed by the model in near(ish) future? Like the whole context refresh thing? Codex is already better (ime) than CC.
And then all these planning steps … if CC/codex interviews a ~million senior devs, next iteration of models will perhaps know how to plan way better than today?