Keep chats <30 minutes, ideally 20-minute continuous segments.
Use a `notes/TODO.md` file to main a checklist of objectives between chats. You can have claude update it.
Commit to version control often, for code you supervised that _does_ look good. Squash later.
This glitch often begins to happen around the time you'd be seeing "Start a new chat for better results - New chat" on the bottom right.
If you don't supervise, you will get snagged, and if you miss it and continue, it'll continue writing code under the assumption the deletion was fine: potentially losing the very coverage you'd hope to have gained.
If it does happen, try to scroll up to the chat before it happened and "Restore checkpoint"
claude-3.7-sonnet-thinking, Cursor 1.96.2
It improves the context that Cursor has and reduces hallucinations significantly. It's early, but 400 users say it's a lifesaver.
Shoot me an email? hi [at] nmn.gl or cal.com/namanyayg/giga
..I just code it myself?
...or you can tell the LLM "write me a go application that adds links from this JSON dump of wallabag.it to raindrop.io" and it's done in 10 minutes.
(It did use the wrong API for checking if a link already exists, that was an additional 5 minutes)
I've been doing this shit for a LONG time and it'd take me a way longer than 10 minutes to dig through the API docs and write the boilerplate required to poke the API with something relevant.
No, you can't have it solve the Florbargh Problem, but for 100% unoriginal boilerplate API glue it's a fantastic time saver.
Works for common stuff, not so much for highly specialised things. An LLM can't know something it hasn't been "taught".
> Keep chats <30 minutes, ideally 20-minute continuous segments.
> ...
Is it just me or does this sound like a standard coding/mentoring practice?
I am starting to wonder how this will all end up. For instance with API use, we (my company) can burn $100s/day with sometimes insanely bad results all of a sudden. Now I bet I signed away all my rights, but in some countries that doesn't cut mustard for consumers buying things. If an API delivers very solid results one day and crap the next and I spent a lot of money, how does that work? There are many people on reddit/youtube speculating why claude sometimes responds like a brilliant coder and sometimes as if it had a full frontal lobotomy. I see this in Cursor too.
> If an API delivers very solid results one day and crap the next and I spent a lot of money, how does that work? There are many people on reddit/youtube speculating why claude sometimes responds like a brilliant coder and sometimes as if it had a full frontal lobotomy. I see this in Cursor too.
This seems like an incredible over-reach. There's no predatory behaviour here. You're free to cancel at any time.
It's an incredibly fast moving field, a frontier field in software. To say that, in order to charge for something, you are legally bound to never make mistakes and have regressions, is an incredibly hostile environment to work in. You'll stifle growth if people think experiments might come with lawsuits unless they're positive it leads to improvement.
If they decided they were going to lock everything to gpt-2 and refuse to pay back any people who bought yearly subscriptions, sure I would be agreeable to considering this a bait-and-switch hoodwink. But that is clearly not happening here.
Is behavior that inconsistent?
I've used GitHub copilot plenty, and I've observed various "regressions" and inconsistencies, but I've never come even close to that much of a project being totally LLM-generated.
with open(file) as f:
pd.read_csv(f)
This was a mistake not worthy of even gpt3… I’ve also noticed I get overall better suggestions from Claude desktop app.I wonder why
- Use apply (in chat) or composer only if you’re more interested in finding a quick solution than the risk to local code. Often Cursor removes important comments by default.
- Use chat. Create new chats when it doesn’t have the latest version of your code or history/shadow workspace is confusing it. Add relevant files or @Codebase or both.
- Learn to undo. Use git and checkout the files/directories again if needed. I don’t use composer, so files never get deleted.
- Autocomplete is often fairly terrible and regularly gets in the way of trying to see what you’re typing or trying to view. Hit the escape key regularly.
- Use Claude 3.7 for regular coding and 3.7 Thinking for larger things to solve.
This is honestly the only part of this that matters if you do it right. Use composer, but only on a clean git tree. Apply, then look at the git diff and correct anything you don't like. Test it and commit it, then repeat.
Composer and apply are only dangerous if you're not committing regularly. If you never run them while you have uncommitted changes, you can't lose working code.
Just today, I asked Sonnet 3.7 to use the app's accent color, which can be found in the @global.css, to apply it to some element. It went on a massive tangent, scanning all sorts of documents, etc. I then asked Sonnet 3.5 for the same thing, and it simply took the accent color and applied it. No tangent, no grep, ...
Thankfully, switching to Sonnet 3.7 in 'Ask' mode has been essentially fine and is comparable to how Sonnet 3.5 performed for me.
It does things I didn’t ask, while deleting random things I just asked to add in the previous prompt, it’s a mess.
However it does one thing very well: it can make me angry very quickly, like a real human.
Definitely can be very annoying if you do just want it to execute on a set of instructions.
To me it's not a good deal that I pay for something that makes up the things it wants, and completely disregards what I asked. It's like the recipe for the worst SaaS you ever used.
As the son of a pilot, this sentence is really funny to me. A real-world pilot would switch the places of 'copilot' and 'autopilot' in your metaphor-- the autopilot maintains course on a given vector but isn't the same thing as a human being in command of the vehicle for exactly the reasons you refer to.
If it goes off track I put a rule in there. It’s like a junior developer that I have to keep constraining it to project goals, coding styles and other aspects.
I have different files in there to help with being able to reuse rules for different projects.
Overtime it’s getting better at staying on track.
I use Claude web to kickstart entirely new work or to attempt large refactors.
Aider is impressive but I can't use deepseek api at work so aider becomes too expensive using sonnet.
Which leaves me copying pasting into kagi code assistant as my most feasible and reliable llm coding tool at work. And its actually very good, but the copy pasting gets tedious.
For the most part, I just type code the same way I use to but I get:
- an auto-complete on steroids
- the tab feature reminding me of impacted code I forgot to update after making a change elsewhere (big one as I easily get distracted).
I very rarely use the chat/composer. Usually I'm faster by going through files manually and making changes myself helped by the features mentioned above.
Cursor is the first time that I have felt that a chat-like UX could actually be useful for coding, because it gets the context for me. I still prefer autocomplete (and Cursor's autocomplete is very very good), but chat is actually occasionally useful for me in large projects. Without Cursor chat is only useful for one-off no-or-low-context scripts.
At this point it’s really a question of how you’d rather spend your time - managing an AI? Or writing code?
Even though generally I prefer the latter, it’s fun to take a break and give the former a short occasionally. I’d say currently I let cursor do its thing for maybe 1 out of every 5 or 6 tickets, usually just for the sake of variety, or if I’m spinning my wheels and need to look at something to get started.
Keep the input minimal. Keep a set of gold standard tests running in a loop to catch problems quick. Don't tune out. Debate whether you really need to use that new model you haven't worked with that much yet just because it's newer. And double check you aren't being sold e.g. a quantized version of the model.
In the one hand it’s impressive how quickly I could make a basic UI, on the other hand it’s quite unimpressive my ability to get the Ui to actually do anything, and the number of very basic mistakes (whoops, forgot to add types or here let me import a file that doesn’t exist)
I even had a weird thing where it asked me to paste in the code of a file, even though the file was explicitly added to its context.
I can't imagine not using something like PodMan, I just haven't set it up, so haven't tried Cursor.
Yes, these models are merely approximations, but things aren't just blindly bad as it these comments make it seem.
Edit: Yes, Sonnet 3.7 is eager, but I'd have to assume is designed that way. Yes, sometimes Sonnet ignores my system prompt. Again, these things are merely approximations that map from tokens to tokens. They are not reasoning or intelligent.
It seems that cursor-thinking will come up with 3 options and pick the dumbest one. Leading to much worse performance than non-thinking sometimes.
A big part of it seems to be increased focus on following instructions vs 3.5. If you don't tell it to not cheat, it cheats lol. Sometimes it's even aware of this, saying things like "I should be careful to not delete this" but deleting it is the fastest path to solving the question as asked, so it deletes.
If you use Cursor for personal projects, I recommend reviewing each change very, very carefully.
I use Aider for side projects (with Claude) and for some reason it will also delete working code when making a partial correction. It just throws out the good with the bad when I suggest something.
I basically have to branch, commit every time it makes any progress at all, and squash later. There are built-in checkpoints that basically do this.
I actually run this side-by-side with my preferred IDE, and GitHub Desktop (to visualize the diffs). So, prompt -> Claude makes a change -> I view the diff -> I make some edits -> Commit -> back to Cursor.
Restarting the editor worked! Can also try restarting the computer.
It does seem to be pretty keen on going a step further than prompted though but the code works well so I can't really complain about that.
Too many of these tools seem to either give no diff, or make absolutely massive edits.
And neither has a drunken meth addict on an oz of shrooms and ketamine while snorting low-grade fentanyl.
Because I care about my craft.
> Any tips on fixing those?
Understand that no matter how fancy the guess-the-next-token machine is, it will NEVER replace the hard graft of logically deducing how a change is going to percolate througout your codebase.
The programmer's motto should resemble the old Porshe one:
Logical reasoning: accept no substitute
When an engine can use my codebase to build up an internal logical structure of its cascading effects where potential changes to the code can be what-if'd as a kind of diff, then I will consider it to be worthy of evaluation.Until then, I feel like an NBA player seeing that their opponent has chosen to only dribble with their ass.
I don't think you folks realize that you're the ones hallucinating. Predicting the next token is never going to be anywhere near 100% successful, especially for interesting projects.
Seriously, folks. I mean, week after week we keep seeing this shit show up here and y'all're like "You got any tips how I can keep smoking meth but not lose my last three teeth?"
> In the AI future people will be hungry for content that is curated by people they trust.
I'm all for trustworth people and trustworthy work. Curation by sensible humans is precisely what all information needs.
I wish you the best of luck!