Show HN: Time travel debugging AI for more reliable vibe coding(nut.new)

129 pointsby bhackett4 months ago13 comments

webdever4 months ago
> it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem.
Such an in interesting sentence. App that doesn't work doesn't seem like it's yet come into existence.
This has been my (limited) experience so far. I haven't been able to get an AI/LLM to help me build an app. Even React apps it fails at. I have been able to get an LLM to help with coding questions similar to Stack Overflow questions though (though not always)
- aprilthird20214 months ago
  You can replace "Claude" with any WYSIWYG no-code solution from back in the day like Dreamweaver or whatever and it's basically the same.
  I know it's much more powerful than that tool was, but the experience described is similar between both
- yapyap4 months ago
  lol agreed, also I’m of the opinion that if you want a working app it’s much more frustrating to debug a heap of code generated by an AI than it is to build yourself (maybe with the help of AI, if you really need it). at least with the latter if you really built it yourself you understand all the components (to a certain abstraction point at least)
- galuggus4 months ago
  I've made a lot of apps with claude.e.g I made a pretty complex swiftui app recently even though I don't know swift. Usually you have to help Claude debug them and sometimes point it in the right direction.
  - Terretta4 months ago
    Since the vast majority of Swift and SwiftUI documentation online is outdated, I've found that concatenating the best of "what's new in Swift 5.x / 6.x" blogs then asking it to organize that into a prompt for itself, then adding that to the system prompt, helps the LLM produce idiomatic and current code.
    While these changes may require "new ways of thinking" in humans, the LLM seems to have these conceptual approaches embedded already thanks to other languages that did these things earlier. The what's new just shows it the syntax for these concepts in Swift.
- colordrops4 months ago
  The first pass often executes but the "thumbs" come in when you fix corner cases or iterate on it.
AdieuToLogic4 months ago
> AIs are really good at writing code but really bad at debugging -- it's amazing to use Claude to prompt an app into existence, and pretty frustrating when that app doesn't work right and Claude is all thumbs fixing the problem.
LLM's are not "really good at writing code". They generate statistically relevant text based on their training data set.
Expecting people who do not understand code to use LLM’s for making solutions is like thinking “non-pilots” can successfully fly a 747.
- becquerel4 months ago
  This is the worst kind of pedantry. There is now code in a text file that wasn't there before. It doesn't really matter to me if they code came from human fingers, an auto generator tool, an LLM or a ouija board. It looks like written code and it compiles like written code - it's written code. You can rightly criticise the use of LLMs in many ways but it is more useful to focus on the actual reasons they cause harm than to set red lines over language.
  - leptons4 months ago
    > It looks like written code and it compiles like written code - it's written code.
    And more often than not it's just plain wrong code. LLMs aren't actually writing code, they are guessing at what might satisfy an input. Writing code is more than guessing, it's about assembling instructions with intent, for a purpose. LLMs lack that intent and purpose.
    > You can rightly criticise the use of LLMs in many ways but it is more useful to focus on the actual reasons they cause harm than to set red lines over language.
    The "actual reasons" they cause harm are inherent to how LLMs work. The real problem is people not understanding how they work and placing too much trust and in them and believing the hype. They are not some miracle, they aren't even all that helpful, and in my experience they are more of a waste of my time.
  - zombot4 months ago
    [flagged]
- liamkearney4 months ago
  Exactly my annoying exert as well. LLMs are good at debugging not writing code.
  It’s just a statistical rubber duck, “what other obvious (common) things haven’t I thought about yet Mr duck?”
- pizzafeelsright4 months ago
  My counter:
  My 5 year old is recreated duck hunt with almost zero assistance in an hour.
  I helped with the copy paste.
  - 4 months ago
    undefined
PetrBrzyBrzek4 months ago
If anyone is wondering why it looks like Bolt, it’s because it’s using Bolt.DIY, an open-source fork of Bolt (https://github.com/stackblitz-labs/bolt.diy). The catch is that it's still using WebContainers from StackBlitz, so it's not really possible to run it commercially. You need to get rid of WebContainers and find something different.
- bhackett4 months ago
  Thanks, yeah we're really thankful to StackBlitz for open sourcing the early version of Bolt.new and to the Bolt.diy community for continuing to develop it.
  We don't have a commercial offering yet and are planning to migrate off WebContainers for the upcoming full stack features -- WebContainers show their limits pretty quickly in a full stack context (e.g. CORS issues) and we need observability into the server side of the app for full stack debugging.
  Regardless, our interests here are only lightly commercial. We're not really developing Nut to drive revenue but to help us develop the debugging API and push forward the SOTA for AI development as effectively as we can. That API is what we want to sell.
babyshake4 months ago
It does seem like the same type of UI as v0.dev ("What do you want to do?") is a strange UI/UX for fixing a bug, although I see there is a way to import code. The ideal UI/UX seems to be a CLI tool or VS code extension where I give it access to the directory and describe the bug and it just figures out what to do
- bhackett4 months ago
  Yeah, we allow importing projects so you can fix bugs you've encountered elsewhere, but we want to streamline the app building process and the UX used in v0/Bolt etc is already really well done. We want this app building to be more reliable though, it's easy to hit bugs that prevent progression.
  We're also interested in using our API with MCP so that e.g. Cursor could be used to fix bugs you're seeing locally, and plan to explore that angle before long.
unclad59684 months ago
What is vibe coding?
- ryandrake4 months ago
  The way I understood it: Cobbling a program together by simply prompting AI assistants over and over, blindly using the generated code, and repeating until it barely approaches satisfying the requirements. Not worrying about things like correctness, proper design, code cleanliness, understandability, performance, code size, security, data protection, maintainability, or even bugs unless they catastrophically stop the user from running the program.
  I really hope this doesn't actually catch on in "real" engineering, beside as a meme joke.
  - bhackett4 months ago
    Yes, that's all true. Even so, vibe coding empowers anyone who can write clear instructions to build software, but the limits of the technology get hit pretty quickly by non-developers and they have little recourse. This blog post https://addyo.substack.com/p/the-70-problem-hard-truths-abou... is a great overview.
    The tech will get better and better (I couldn't imagine we'd be doing this a year ago) but to be truly useful it has to reliably produce reasonably well engineered code, and effective debugging is a key piece of that.
    curioussavage4 months ago
    I’m sure it will. My pessimistic take is that the worst case is that thousands of bozos create crappy little apps that only cause minimal harm. And people just endure it instead of pushing for better guard rails.
    Best case is some high profile shit show caused by software made mostly or entirely by ai that hopefully is bad enough that legislators wake up and realize that in the modern world software is essential enough that you can’t let just anyone sell it or services based on it. Just like you can’t allow anybody design/build bridges or hardware or whatever.
    But I’m sure thats wishful thinking. Hacks and buggy software causing consumers harm is just accepted and software industry folk all hope to be billionaires so nobody cares.
  - ge964 months ago
    It's the infinite improbability drive in hitchhiker's guide
    would be funny though, who produces the result faster a Fiver or an AI in a loop for a day
    It spits out urls to sites and sends em to Fiver QA people, take a shot every time the app doesn't work
    Wonder the cost effectiveness, have a randomizer start producing/hosting code auto submit it to Product Hunt
  - spiderfarmer4 months ago
    Judging by how many people blindly posted Stackoverflow answers, there will be a significant amount of code ‘written’ this way.
- PStamatiou4 months ago
  Here's where it all started: https://x.com/karpathy/status/1886192184808149383
- vunderba4 months ago
  An even lazier form of LLM assisted coding where you blindly spam the tab key without even taking the bare minimum amount of time reviewing the garbage that it's busy outputting.
  Karpathy "coined" the term and I absolutely hate it. It's up there with "asshat" and "awesome sauce" for profoundly stupid terms.
  - anonzzzies4 months ago
    I find it more akin to prompt engineering; something else that is nothing more than 'typing some shit until it does something useful to someone' and then acting like it's actually a skill.
    But we are very good in our profession to make up garbage terms to do anything but describe garbage.
- pimlottc4 months ago
  It's truthiness for software. It doesn't matter if the code is correct, as long as it "feels" correct.
- yoavm4 months ago
  It's when an AI writes the code for you.
bluelightning2k4 months ago
You say it's called Nut because it's about cracking nuts. Wouldn't a truer reason be it's a play on bolt? (Nuts and bolts)
I'd be interested to read a blog post or technical write up. I think conceptually it's an interesting idea
- bhackett4 months ago
  Well, it can be both. This post https://blog.replay.io/the-nut-api is the best technical overview of the API and discusses several examples.
zaptrem4 months ago
Just letting you know the about page has black text on a black background
- xeonmc4 months ago
  Can’t expect too much reliability from the result of vibe coding.
- CyberDildonics4 months ago
  These are the results you can expect from someone who says they are 'vibe coding'.
- bhackett4 months ago
  Thanks for the report! The about page is fixed now when looking at it in dark mode.
  - itishappy4 months ago
    Black text on black background is also used on the problems page, and the background only extends downwards one page length.
    bhackett4 months ago
    Thanks! These are both fixed now. Clearly we need to do some more dark mode testing...
    itishappy4 months ago
    I now see only a giant nut when viewing the main landing page.
    Edit: Now everything is made of buttons.
    bhackett4 months ago
    Hmm, strange, it's loading alright for me but we've had a couple reports of rendering problems. If you have a chance to file an issue here https://github.com/replayio/nut.new/issues I'd appreciate it, thanks!
    itishappy4 months ago
    Working fine now, but I'm on a different device. Apologies, I didn't get a chance to troubleshoot.
  - tymscar4 months ago
    Theres 0 chance this response was not written by an LLM. I don’t know what, but it screams genai.
theturtletalks4 months ago
Nut looks like a fork of Bolt, how does Nut differ?
- bhackett4 months ago
  Yes, Nut is a fork of https://bolt.diy and like bolt.diy you can add your own API key and use it as much as you want (Nut is hosted though so you don't have to set anything else up).
  The improvements we're making are under the hood. When you ask Nut to fix a bug it should do a much better job -- we record the app's behavior and analyze it so the AI has context for the changes it needs to make.
  We've also added some UI to approve or reject the changes the AI makes. For now we're using this to gather feedback so we can improve Nut, but down the line we'll also refund the user any credits when they reject changes -- you shouldn't have to pay when the AI screws up, a big issue with these tools (and vibe coding in general).
  - theturtletalks4 months ago
    Interesting, I'll check it out. Any plans to open source Nut like Bolt is?
    bhackett4 months ago
    Yeah, the source is here: https://github.com/replayio/nut.new
    We'll continue to keep it open source as we develop it.
nenadg4 months ago
Looks fun and it does create something - https://nut.new/chat/prince-of-persia-platform-game
I couldn't start the game though, but it seems runnable given some debugging. Great work!
- bhackett4 months ago
  Thanks! Unfortunately the chat links aren't shareable yet. We're planning on adding this within the next couple weeks along with the other full stack features (database integration and easy deployment).
- krat0sprakhar4 months ago
  Can you share the prompt you used to generate this game? (given chats aren't sharable)
- JackYoustra4 months ago
  Hm, the link isn't working for me
ScrexyScroo4 months ago
LLMs at best are a good auto-complete. I have been trying to use LLMs to code since GPT 3 came out. While Claude 3.7 has made some progress still it can't generate an app into existence, although it's great for boilerplate or hinting to which direction I should try to find the documentation.
- SparkyMcUnicorn4 months ago
  In the past week I created two apps within 3-5 iterations. No manual edits (other than .env).
  Another one I thought was pretty cool was handing it some API docs for something, then had it build a UI and admin interface from scratch.
  The dev agent tools these days aren't half bad, and they're getting better.
- anonzzzies4 months ago
  That hasn't been true since 3.5; it can and does generate full working code and sometimes with only a few attempts. Sometimes it cannot figure it out and then you need to fix it yourself as it will just loop forever. These things can be absolutely trivial (and thus frustrating) at times, but it's also fairly magical when it just does work.
- strbean4 months ago
  Have you tried this tool?
  I just generated two tetris games, one with ascii art and one with WebGL, and I found it quite impressive. Maybe simplistic apps, but still quite impressive with it's ability to create functional games and fix bugs with minor prodding.
- aprilthird20214 months ago
  A great auto-complete, I'd say. But that's as much as I've gotten out of it too. Still a huge productivity booster
OsrsNeedsf2P4 months ago
Lots of cynical comments but the examples look great. I'll be taking this for a spin when I get home :)
T3RMINATED4 months ago
[dead]
henglihong-jsu4 months ago
[dead]