The sheep movement is excellent. You could make it even more realistic by having them favor lusher areas and by having one occasionally bolt spastically (hard mode?)
A handler mode where you play as a human and shout commands at the dog could be cool too!
One shot produced a game with no sheeps. I had to told it to fix two bugs then.
Overall, the graphics and games seems good enough and better than most of the closed models that were shown. However, not surprisingly, falls short of Fable.
I've put the index.html and open code session here:
https://github.com/da-x/when-ai-fails/tree/qwen3.6-27b/shepa...
But once you start maintaining it, improving it and fixing bugs, you’ll eventually need to rip it apart and put it back together again while understanding how it all works.
This is why I think the better approach isn’t to one-shot but to have the architecture in your head and build it up piece by piece, with the AI accelerating the code writing.
Define big I guess. They're non-trivial, mix of internal enterprise tools, a multiplatform app (android/ios/mac/windows/web currently headbutting its way through review), including a billing system for my small telecommunications business.
> I dont think this is good for your mental health or physicaly your brains health
I find the experience of doing it without writing the code to be intellectually pretty similar. I still solve a lot of problems, the LLM couldn't, for example, one shot the event sourcing model I built for synching data between devices. It took quite a few iterations and I had to define a lot of the architecture, but I did it at a level that wasn't "here is a class, here is a module, this module does XYZ", more at the "whitepaper" level or describing how specific bits of the app needed to work in order to solve some problem.
It's also very similar to managing other developers.
> Its like driving your car 3 blocks instead of walking, your physical health will suffer
It's more similar to having staff rather than doing everything yourself. The problem solving just shifts to a different area, and you get more done.
You absolutely can have the LLM write maintainable code. A few tricks I use are to ask it to plan out features in phases, and then do a branch and a PR for each focused piece of work. It makes it a lot easier to review and understand what's happening.
I also ended up making a tool which lets the LLM get a high level perspective of the codebase, and then see parts that are structurally gnarly. I've been using it to do refactors and clean things up periodically. It helped a lot with keeping the architecture clean.
https://github.com/yogthos/wavescope-mcp
Adding features and evolving the codebase has not been a problem even at this scale.
Coding is not the sole problem solving skill. In fact, coding may be one of the easier skills much of the time. Deciding what to build, where to focus efforts, understanding a customer's needs, could all be just as if not more challenging than the coding part.
And be sure to only walk barefoot. Relying on artificial shoes weakens the muscles and the skin of your feet.
I suppose critical thinking skills are also as bad, making you question the state of the world. Problem solving is another one, deluding one into believing there are solutions to suffering.
This is what spec .md files are for, skill issue
Like you said, working and maintainable are very different things. One-shot hits a wall the moment you need to do anything non-trivial after the initial generation. Bug fixing is extremely hard, even with AI assistance. Same with feature additions. It's pretty much black box at this point on. AI that wrote it now goes in loops wasting tokens without being able to can't reliably fix it either, because it has no memory of the architectural decisions it made (or didn't make, for that matter) the first time round.
What I realized is that the failure here is the absence of a shared mental model between you and the code.
I'm a product designer with average front-end know-how, and a solid understanding on HTML/CSS and how the web works, coming from the era of hand-coding html/css files. After vibe-coding a few products early this year, purely to learn how AI works, how to design AI interaction patterns etc., I built something called Intent Model. (largely inspired by SDD / BDD.
Intent model is a structured, typed artifact (basically a JSON contract) that captures actors, entities, journeys, rules, and constraints before I write (or make the AI write) any code. It sits upstream of everything. Think of it like a condensed, strict distillation of your PRD / BRD / requiremnt doc.
When you hand the AI a well-defined intent file instead of a vague brief, this one-shot becomes structured and bound by rules. Now you're giving it an architecture and to conform to. You define (or make the AI define) the precise variable names, their types, lifecycle, user roles, responsibilities, business rules and constraints in the file. Every generated artifact can trace back to a decision you made deliberately, reviewd and signed-off.
In the design world, we already do this by using design tokens. We can tell the AI that it needs to strictly use design tokens and not use stray properties like a hex color value or raw values not defined in the token contract. This is easily auditable by AI as well.
The result is you can still move absurdly fast and still maintain the understanding, which the one-shot approach throws away. This way, you know why every piece exists because you defined the intent before the AI implementated it.
AI is the accelerant, and you're the architect. The intent is the blueprint you generate to guide/harness the AI.
The best part is, once you have an intent contract at the heart of your project, it becomes impossible to break things too, logically or experience-wise.
Applause to Anthropic: mission accomplished!
Do people do no research or introspection when they’ve had an “idea for years”? There are countless examples of this exact game. I played this on the Gameboy Advance! There’s like 50 of them on the App Store right now.
The standard “this almost certainly exists wholesale in the training data” applies, but I’m also interested in how you carry an idea for years and don’t notice this, or whether the “idea” here was actually “using this thing that’s been remade thousands of times as an AI benchmark”.
There’s nothing wrong with remaking an old classic formula, especially in game dev. It’s the describing it as “an idea I’ve had for years” that rings weird.
Doesn’t look like the author toyed with the idea at all, though, apart from having it in their head. Considering how they describe themselves (Check the About/Home page), if they had toyed with it at all they would have already built it.
I also don’t see why finding out it exists would be “painful”. The game is free and the author didn’t experiment or learn anything from building it, they just prompted it in one go.
I have pointed out on here before that instances of truly unique human ideas not grounded in nature or previous ideas from others is almost nil, there are not many examples that someone can give me. All human ideas and work is derivative.
Elves? Humans with pointy ears. Werewolves? Humans mixed with wolves. Car tyre? Cart wheel...stone wheel/roller. Etc.
I'd wager it's because ideas are simpler to explore orthogonally, giving an overview of what's possible.
So it’s interesting to me that the creator here didn’t encounter the tens of physically published versions, or the hundreds of them shipped to digital app stores, or all the codebases on GitHub, in the course of making this. I’m sure they would have done naturally prior to GenAI. Is that good or bad? I don’t know! But it’s interesting to me.
The simplest counterargument: since there are already tens of similar games out there, why didn't the previous authors, supposedly grass-fed genuine checkmark blood-through-their-veins humans didn't notice the other 9-8-7-6-5... games, and still released their own version? Maybe because it was still that they wanted the game out there? Maybe because originality really isn't that common? Maybe because each individual had their own idea and spin to it? Maybe because they wanted the game out as they made it?
Same for this author. How they made the game is irrelevant, and nitpicking the "originality" or anything else is silly. Something like this wasn't possible 3 years ago. Now it's possible. Deal with it, and stop trying to find ways to diminish it. It's a huge accomplishment any way you cut it.
I gave a simple counterargument to this. Since there are "countless" prior games, many of them released before genAI, your argument is pointless.
To spell it out in case it is still non-obvious: knowing this allows iteration. It allows remixing. It allows you to inspect what has come before and what it did well and where it succeeded and where it fell short and thus what you could _add_. It is an enabler of creativity! Thus I think it is interesting that GenAI may make it harder to have this experience.
a) To make it better
b) To learn, in service of a) or another project
he used to say “the best artists have the biggest record collections”.
they’ve done their research. they developed taste. they’ve been in that battle with the unoriginality demon. they’re still in that battle with the unoriginality demon. they’re always searching for new. for unexplored. for different.
they’ve also figured out what “good artists copy, great artists steal” means.
we take small bits. small ideas. small riffs. we turn them into our own. then we repeat that N times to create “a song”. we borrow. we revere. we obsess. turning lots of little differences into a completely new work. yes it’s all derivative. but derivative originality takes a lot of fuckin’ effort to get right. to be tasteful.
this thing isn’t artistic stealing, it’s the most low-effort stealing possible. creativity, originality and more importantly taste appear nowhere here.
so, is it bad? depends on your perspective on creative endeavours being worthwhile and whether you have taste or not i guess.
edit - personally i don’t think you can polish a turd. even if you rewrite it, the memory lingers.
Snark aside (and apologies), there's absolutely nothing wrong with the "no new ideas" take and nobody should think there is. Humans tend to work collectively, try as we might to do or appear otherwise, and often come to the same conclusions through reasoning and logic. No one-person truly invented the light bulb, etc, when really all inventive thought is branches of derivative thought as we build our collective knowledgebase. A better question would be how many novel ideas are the logical conclusion of branches of derivative thought and how many are tangential brought about by the injection of our irrationally.
A child is born every 4.4 seconds. But it took me and my girlfriend over 9 months to birth one!
Even if an original idea did show up every minute globally, does not mean that it takes only a minute to come up with the idea.
By my math you should should have at least 2 in that time, unless one of you wasn't pulling their weight.
Ah well, it’s still fun and it does appear to measure how good AI is in creating these kind of games.
I did the same recently just for fun - I really enjoyed "Gravity Force" on the Amiga - itself a lunar lander variant.
Could a model build a Gravity Force like game I could run in-browser? Yep! (I never made it as good as Gravity Force - just got the basics down)
But also, how original can a game idea ever (now) really be – there's always going to be things you can describe it as 'like' or a mix of, even if not identical. And for such simple things, very little room for being non-identical to whatever they're like.
In case it all just comes from training data, "one shotting" a game would be more comparable to "git pull" and changing some assets than "generating code".
I'm not saying this is how it works, I'm trivializing LLMs with this statement, but when I see someone on linkedin excited about generating checkers and chess my first thought is "you could have done that with git pull for the past 20 years".
IMO the ability to describe a game and let the AI implement a PoC is pretty wild. It's a signal as to whether such an idea is worth pursuing further to me rather than a finished product. And I am enjoying all the experimentation with existing genres as well as the occasional truly original experience due to the dramatically lower cost of entry. What these efforts lack currently is the playtesting and polish that is hard without a human in the loop. So much like agentic engineering, the productive work is in being a centaur. It surprises me how much pushback this is getting from the demographic that embraced the relatively inscrutable git over simpler alternatives for small teams along with the tower of Babel of equally inscrutable frameworks and APIs.
It's not unlike Martin Scorsese admitting upfront he's using GenAI as a creative tool to visualize scenes for his scripts. The predictable backlash that he dare use AI in any way for any aspect of his craft despite his irrefutable oeuvre is a sign of the times more than a legitimate objection to me. Ask the users of deviantArt to stop working with Photoshop and see how that goes.
Having worked in the game industry in the past and adjacent to Hollywood over my career, they were already top heavy exploitative cultures before AI. And any auteur that thinks they can replace humans with agents is as tuned in to GenAI as the tech CEOs and VCs that happily announced layoffs and instituted tokenmaxxing benchmarks to measure the "incredible" boost in productivity AI enables.
So my question, ahead of the mandatory downvotes for not chanting along with the torch-bearing mob against AI in every way is: beyond the CapEx and the buildout issues (both legit IMO), how is AI impacting you negatively and personally?
Even with the perfect AI to write, one would need to iterate through many different ideas, play testing constantly, getting people to play test and analyze what they found fun and where they got stuck. And to get the best ideas you'll need to be playing lots of different kinds of games.
Like I remember in college I had something akin to the idea of “50 people 1 question.” I was starting to become interested in shooting my own documentaries and was particularly interested in man on the street style interviews. I pitched it to a friend who then told me about 50p1q, which baffled him because it was like the hot thing already a year or two prior haha.
Anyway that’s just something I think happens a lot. And now with genAI people don’t throw the idea around even, they quickly do a crappy version of the thing, present it, then find out it exists. Which isn’t terrible I guess but it’s one less filter for my better or for worse.
Also this is a game has very simple mechanics I am sure you can generate as easily with Cursor or some other tools.
If the end is a combination of education and product discovery, then yeah maybe, although those are also dimensions of personal productivity that can be amplified by leveraging AI tools.
If the end goals of programming is leveraging computer automation, then nobody actually cares how the automation infrastructure gets established, and the less distractions with low value implementation complexity, the better.
Or is there some other AI usage described in this article that is not supported by cursor?
Some random examples:
https://x.com/fe_yukichi/status/2064635098411180374 https://x.com/akiraxtwo/status/2064780732082651402 https://x.com/kieradev/status/2064482704763085202 https://x.com/VincentLogic/status/2064699740936356065 https://x.com/XiaohuiAI666/status/2064994538591223911
However as others have pointed out the idea is a common one, probably because many people are exposed to sheep and sheep dogs and farming. Which further reinforces a previous point I made that all human work is derivative and barely anything actually original.
But that's why it doesn't matter! Make that game/app/website that someone else has made before, make your own interpretation! The beauty and uniqueness is in the skin not the flesh!
The title could have been just “Shepherd’s Dog: A game by Fable 5”.
https://venturebeat.com/technology/anthropic-says-it-hit-a-3...
In close lockstep with @ai_fry_your_brain, who at least makes it clear right on the tin that they're not here to engage in any earnest capacity whatsoever. Always a mixed feeling between being appreciative of that, and finding it blatant.
Good thing it's AI ruining communities, a thought I have no doubt you also share in. If only people properly recognized the hard work of people like you in this.
https://vnglst.github.io/when-ai-fails/shepards-dog/claude-f...
It instructs me to rotate my phone. The pasture doesn't get any bigger, but now the top bar blocks half the screen. The tooltip about rotating stays in the middle of the screen. Unplayable. There's a music note indicating sound, but I never heard the dog bark.
It's exactly the kind of unpolished slop I expected it to be.
fROnTEnD DeV Is DeAd
DeSiGN Is DeAD
Cool idea tho, could be a fun game if if the UX wasnt so hostile.
For interest, some shepherds run two dogs, each on a different whistle or voice command pitch.
If this is what you imagined, you need to imagine better.
* Pathfinding is terrible (if I end up inside the fenced area clicking outside doesn’t lead me out). * Forcing me to go landscape while not even filling the entire screen is terrible (where did you even test this). * Controls are disastrous (I’m either barking all the time or a bark makes my sprite ignore my movements).
You one-shotted this, and I will admit it’s incredible that these agents can create something like this in minutes.
But your statements along with the “most dangerous AI model” in the title are disingenuous. Please do better.
OP is just pushing slop, the 80% part anyone gets for free. (well 20 bucks)
Bruv, there are already countless games with this exact mechanic...