I switched not because I thought Claude was better at doing the things I want. I switched because I have come to believe OpenAI are a bad actor and I do not want to support them in any way. I’m pretty sure they would allow AGI to be used for truly evil purposes, and the events of this week have only convinced me further.
And the weirdest thing that I noticed: instead of skimming the response to try finding what was relevant, I just straight up read it. Kind of felt like I got a slight amount of focus ability back.
Accuracy is something I can't really compare yet (all chatbots feel generally the same for non-pro level queries), but so far, I'm fairly satisfied.
On the contrary, it's great. It's fully capable of outputting a wall of text when required, so instead of feeling like I'm talking to something that has a minimum word count requirement, I get an appropriate sized response to the task at hand.
For ChatGPT and Gemini, yes.
But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
1. It changed without noticing things like "Touple.First.Date.Created" and "Touple.Second.Date.Created" and it rendered the code unworking by chaning to "Touple.FirstDate" and "Touple.SecondDate"
2. There was a const list of 12 definitions for a given context, when telling to rewrite the function it just cut 6 of these 12 definitions, making the code not compiling - I asked why they were cut: "Sorry, I was just too lazy typing" ?? LOL
3. There is a list include holding some items "_allGlobalItems" - it changed the name in the function simply to "_items", code didnt compile
As said, a working version of a similar function was given upfront.
With Claude, I never have such issues.
Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).
What is so strange to me is that surely there is more C# out there than ESP-IDF code? I don't have a good explanation beyond saying that my codebase is extensively tested and used; I would know very quickly if it suddenly started shitting the bed in the way you explain.
We already have coding tuned models i.e. Codex. We should just have language / technology specific models with a focus on recent / modern usage.
Problem with something like Java is too old -- too many variants. Make a cut off like at least above Java 8 or 17.
I feel like this is an example of people having different standards of what “good” code is and hence the differing opinions of how good these tools are. I’m not an embedded developer but 600K LOC seems like a lot in that context, doesn’t it? Again I could be way off base here but that sounds like there must be a lot of spaghetti and copy-paste all over the codebase for it to end up that large.
same here :)
> one-shots a larger/complex C# feature
I can show you a timeseries data-renderer which was created with 1 initial very large prompt and then 3 following "change this and that" prompts. The file is around 5000 lines and everything works fine & exactly as specified.
Is these more related to the existing source code or is this a bad pattern thar you would never do regardless of the existing code?
One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.
That said I do personally feel Claude to produce far better results than competitors.
In my experience working in a large codebase with a good set of standards that's not the case, I can supply examples already existing in the codebase for Claude to use as a guidance and it generates quite decent code.
I think it's because there's already a lot of decent code for it to slurp and derive from, good quality tests at the functional level (so regressions are caught quickly).
I do understand though that on codebases with a hodge podge of styles, varying quality of tests, etc. it probably doesn't work as well as in my experience but I'm quite impressed about how I can do the thinking, add relevant sections of the code to the context (including protocols, APIs, etc.), describe what I need to be done, and get a plan back that most times is correct or very close to correct, which I can then iterate over to fix gaps/mistakes it made, and get it implemented.
Of course, there are still tasks it fails and I don't like doing multiple iterations to correct course, for those I do them manually with the odd usage here and there to refactor bits and pieces.
Overall I believe if your codebase was already healthy you can have LLMs work quite well with pre-existing code.
Don't we all?
- literal Claude ads I see online
- my underperforming coworkers whose code I’ve had to cleanup and know first hand that no, it wasn’t flawless
This kind of sentiment is gaslighting CTOs everywhere though. Very annoying.
It keeps trying to re-invent the wheel, does a bad job of it.
The physics sim was supposed to be a thin wrapper around existing libraries, but instead of that it tried to write all the simulation code itself as a "fallback" (but it was broken), and never actually installed the real simulators that already did this stuff despite being told to use them in the first place. The last few dozen(!) prompts from me have been pairs of ~["Find all cases where you've re-invented the wheel, add them to the planning document", "now do them"]. And it's still not finished removing the original nonsense, so far as I can tell.
One of the two Swift experiments is just a dice roller, it took about 10 rounds of non-compiling metal shaders (I don't know metal, which is why I didn't give up and do that by hand after 4) before I managed to get that to work, and when it did work it immediately broke it again on the next four rounds. It wrote its own chart instead of using Swift Charts, and did it badly. It tried to put all the hamburger menu options into a UIAlertController. Something blocks the UI for several seconds when you change the dice font. I didn't count how many attempts it took to correctly label the D4.
The other Swift experiment was a musical instrument app, that got me to the prototype stage, eventually, but in a way that still felt like a student's project rather than a junior's project.
That's not a moat though. Claude itself wasn't there 6 months ago and there's no reason to think Chinese open models won't be at this level in a year at most.
To keep its current position Claude has to keep improving at the same pace as the competitor.
That's, just, like, your opinion, man.
Though tbh I hardly feel Claude is innocent either. When their safety engineer/leader left, I didn't see any statements from the Anthropic team not one addressing the legitimate points of his for why he left. Instead we got an eager over-push in the media cycle of "Anthropic standing up to DOD! Here's why you can trust us!"
It's all sounds too similar to propaganda and astroturfing to me.
It's perfectly possible that 'truly evil purposes' were the goal all along. Slogans and ethics departments are mere speed bumps on the way to generational wealth.
I think HN in particular as a crowd are very vulnerable to the halo effect and group think when it comes to Anthropic.
Even being generous they are only very minimally a "better actor" than OpenAI.
However, we are so enthralled by their product that we tend to let the view bleed over to their ethics.
Saying we want out tools used in line with the US constitution within the US on one particular point. Is hardly a high moral bar, it's self preservation.
All Anthropic have said is:
1. No mass domestic surveillance of Americans.
2. No fully autonomous lethal weapons yet.
My goodness that's what passes for a high moral standard? Really anything that doesn't hit those very carefully worded points is not "evil"?
However, I would think I'm not alone in that I'm generally wanting to do good while also wanting convenience, I know that really every bit of consumption I do is probably negative in some ways, and there is no real "apolitical" action anyone can take.
But can't I at least get annoyed and take my money somewhere else for the short amount of time another company is doing it better?
Yes, if openAI suddenly leaps forwards with codex and pounds anthropic into the dust, I'll likely switch back despite my moral grievances, but in a situation where I can get mildly motivated to jump over for something that - to me - seems like a better morality without much punishment to me, I'll do it.
there are some people (companies are run by people) that are so bad I boycott them. Most bad I treat like society cannot work without accepting them anyway.
Although we shouldn't let that mean we misjudge what we are actually getting.
You can see the significance of this is you look at German Nazi history. If more companies had stood up to the administration, the Nazi state would have been significantly harder to build.
In my opinion, what Anthropic did is not a small thing at all.
By contrast Anthropic wouldn't? Yet Anthropics stance is only two narrow restrictions. As I said are those two things the only evil things possible?
If not, why is it that people on HN think Anthropic would not allow evil usage?
My hypothesis is a halo effect. We are so enthralled by Claudes performance that some struggle to rationally assess what Anthropic has actually done.
Yes it's no small thing to say no to the Trump administration but that does not mean they haven't said Yes to otherwise facilitated other evils.
In fact to me the statements from Anthropic seem to make clear they are okay with many evils.
Really I think Anthropic should have a single restriction: to not assist with illegal or unconstitutional activities. If automated killings etc is illegal then it would be covered by that one rule.
I don't think Anthropic should be in the business of deciding what is "evil".
Moving back to doing this archaic thing called using my own brain to do my work. Shocking.
For marketing or personal stuff I do sometimes want images, but I don't really mind going somewhere else for that
Of course, also OpenAI being ran by openly questionable people while Dario so far doesn't seem nowhere near as bad even if none of them are angels.
OpenAI - since the beginning has been anything but open. If you spoke anything ill about OpenAI here until yesterday, you would be downvoted into oblivion because, let's face it, Sam has always been the poster child of this community.
So, basically, even after them publicly announcing they were evaluating licensing models where they wanted to take a % of your business for using their models [1], there was still 0 outrage, and anyone who pointed that out, always got shot back with "OpenAI CAN DO NO WRONG" in the comments always.
He makes one decision you all don't agree with and now it's cancel culture time?
And somehow, Anthropic is the hero in all this? Make no mistake - all the model providers are building detailed user models. Every bit of information you provide to it is of course being used to for detailed user targeting. This is no different than the "Apple GOOD, Google BAD!" tropes. There are no heroes in for-profit corporations. Everyone is operating a for-profit business model and optimizing for the same profits.
Stop with the NPC behavior. We are better than this.
[1] https://openai.com/index/a-business-that-scales-with-the-val...
"Licensing, IP-based agreements, and outcome-based pricing will share in the value created. That is how the internet evolved. Intelligence will follow the same path."
Why are you assuming these are real people and not NPCs?
The amount of money flowing around AI is staggering. To believe that the AI companies aren't flooding all the social media zones with propaganda is disingenuous.
Touché
You don't use "believe" with "disingenuous": it literally makes zero sense.
If people honestly believe that, they may be naive. Or they can be "disingenuous" if they're not being sincere. But if you just say what you believe, you're sincere (and maybe naive), and hence cannot possibly be disingenuous.
They also don't know what "context" is or that the LLM has a limited number of tokens it can understand at any given time. They just believe it knows everything at once.
I can't think of much else though so I'm still curious what you or others use it for.
ChatGPT knows the broad strokes of the 3-4 main hardware projects I have on the go, and depending on the questions I'm asking, it will often structure its responses in a way that differentiates based on which one I'm thinking about.
It knows what resistor and capacitor values I have on my pick and place machine, and when I ask for divider ratios it will do its best to calculate based on those values to the degree that it will chain 1-2 resistors together to achieve those ratios.
I knows what kind of solder I use, and has warned me about components with sensitive reflow temperature concerns.
It's an extraordinarily useful feature for engineering and drinking, two things that are commonly found in the same Venn diagram.
Personally, I would still be wary of the black box aspect -not knowing what it does remember and what it doesn't - so I would probably still use projects to make it more deterministic. But that's probably being overcautious and unnecessary in most common cases.
My job, my kids and time preferences around those things, my preferred tech setup and way of working and types of tech I’m better at. Things I already have (home assistant, little nuc, etc). I can throw a random question and not have to add this kind of information or manage it.
Home automation fixing
Proposed integrations with some services locally
Science experiments explained at a few levels, finding good background info and where to read up about some safety information
Maths help for specific areas my kids are looking at and proposed games for that
Evaluation of coding options for my kids
How to link up some ideas on coding, electronics and using the home automation side as some fun outputs
LED strip info and work, again integrating with smart homes and what’s good around the kids
Framework evaluations for automation at work and home
Crystal identification
Looking up local council info
Relevant music suggestions for kids to play on the piano
Here some things cross over. I’m happy writing code, I typically want easy open source options, I have languages and tech I prefer, I’m moving g things to matter, I have home assistant, my son is excellent at maths given his age but I’m working more on comprehension of problems, and a lot more. All those are things that with a bit of background info change the types of answers I get and make it more useful.
Turns out a few month befor I told it in a prompt what car I was driving.
I turned memory of that day.
I didn't receive an answer besides "that's what people like", but I still can't think of (m)any situations where anyone would prefer it.
The only thing I can now think of is using it as a personal therapist. Or asking how to approach their kids. And they're a bit embarrassed about it, because it's still outside the Overton window -especially on HN - which is why they aren't sharing it.
If someone has different usecases, please do prove me wrong! Maybe I just lack imagination.
That alone drives me batty. I can easily spend a couple hours and multiple revisions iterating on a plan. Asking me me every single time if I want to apply it is obnoxious.
I currently use ChatGPT for random insights and discussions about a variety of topics. The memory is basically a grown context about me and my preferences and interests and ChatGPT uses it to tailor responses to my knowledge, so I could relate better.
This is for me far more natural and easier than either craft a default prompt preset or create each conversation individually, that would be way too much overhead to discuss random shower thoughts between real life stuff.
This is my use case and I discovered that this can be detrimental to specific questions and prompts and I see that it can be more beneficial to have careful written prompts each time. But my use case is really ad hoc usage without the time. At least for ChatGPT.
When coding, this fails fast. There regular context resets seem to be a more viable strategy.
I set my name to "User" in the settings, so in a clean-slate chat it has nothing to go on, but the moment claude code does something like `git log` it knows who I am again. I've even considered writing some kind of redaction proxy.
For example, instead of recommending a popular night club, it will recommend the stroll along the river to view the lit up skyline or to visit the night market instead.
It knows other preferences as well (exploring quirky neighborhoods, trying local fast food joints and markets)
Isn't there much more money in automating business processes than in answering consumer questions (sans ads)?
Automating software development has to be a multi-trillion dollar market. And that doesn't account for future growth.
I know the "memory" function can be disabled, but I have a hard time seeing that it would ever really be useful.
Are you suggesting that they should ignore the needs of the vast majority of their users?
I mean, of course they do, it would be worse otherwise
Thorough CLAUDE.md, that makes sure it checks the tests, lints the code, does type checks, and code coverage checks too. The more checks for code quality the better.
It’s just a bowling ball in th hands of a toddler, and needs to ramp and guide rails to knock down some pins. Fortunately we get more than 2 tries with code.
I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.
Why wouldn't a smart OpenAI PM simply add something "nefarious" on the frontend proxy to "slow down" any requests with exactly that prompt?I bet they would get their yearly bonus by achieving their KPI goals.
The /.agents/skills issue for claude code is here: https://github.com/anthropics/claude-code/issues/16345
Their automatic close bot will close it soon as it's been three weeks since the last comment.
For the Anthropic employees here reading along, pitch it to whoever has kept blocking this, because you need to get the most out of this opportunity here.
I have seen quite a few open source projects do this. It works quite well.
Another alternative is to create CLAUDE.md with the exact contents: "@AGENTS.md"
It also showed me the difference between expectation and reality...even though these are billion dollar companies, they still haven't figured out how to make lag-free TUIs, non-Electron apps, or even respect XDG_CONFIG. The focus is definitely more on speed and stuffing these tools full of new discoveries and features right now
There's a bit of psychology around models vs. harnesses as well. You can't shake off the feeling that maybe Claude would perform better in its native harness compared to VSCode/OpenCode. Especially because they've got so many hidden skills (like the recently introduced /batch), that seem baked into the binary?
The last thing I can't figure out is computer use. Apparently all the vendors say that their models can use a mouse and keyboard, but outside of the agent-browser skill (which presumably uses playwright), I can't figure out what the special sauce is that the Cloud versions of these Agents are using to exercise programs in a VM. That is another reason why there is a switching cost between vendors.
It's not "fair" in that I pay for Claude [1] and not for the others, so models availability is not complete except for Claude.
So I did like things at time in the form of how they were presented, I came to really like Sonnet's "voice" a lot over the others.
Take into account Opus doesn't have the same voice, and I don't like it as much.
[1] I pay for the lower tier of their Max offering.
The problem (for me, anyway) is that even several megabytes worth of quality "memory" data on my profile would not allow me to migrate if it can't also confidently clone all of my chat history with it.
To be clear, this is a big enough problem that I would immediately pay low three digits dollars to have this solved on my behalf. I don't really want any of the providers to have a walled garden of all my design planning conversations, all of my PCB design conversations. Many are hundreds of prompts long. A clean break is not even remotely palatable short of OAI going full evil.
Look, I'd find it convenient for Claude to have a powerful sense of what I've been working on from conversation #1 onwards. But I absolutely refuse to bifurcate my chat history across multiple services. There is a tier list of hells, and being stuck on ChatGPT is a substantially less painful tier than needing to constantly search two different sites for what's been discussed.
Yes, all of these are theoretically possible (the APIs now all support web search, as far as I know, there are RAG APIs too, and tool use has been supported for a while), but the various "chat" models just seem to be much better at using their first-party tools than any third-party harness, which makes sense that this is what they've been trained on.
Thank you! I hope this works out.
Edit: perhaps you can just ask nicely?
https://help.openai.com/en/articles/7260999-how-do-i-export-...
But I have this feature turned off, and I cannot imagine ever wanting to turn it on, because I am always thinking carefully about what the AI "knows" when it generates a given response. For example, since I know that the AI always wants to make me happy, when I ask for an "opinion" I'm careful to not let the AI know which answer I'd prefer. I'll often try phrasing the question in different ways to see if it changes the outcome.
Whenever I’m in a conversation and it references something unrelated (or even related) I get the “ick”. I know how context poisoning (intentional or not) works and I work hard to only expose things to the model that I want it to consider.
There have been many times that I’ve started a fresh chat as to not being along the baggage (or wrong turns) of a previous chat but then it will say “And this should work great for <thing I never mentioned in THIS chat>” and at that moment my spidey-sense tingles and I start wondering “Crap, did it come to the conclusion it did based mostly/only on the new context or did it “take a shortcut” and use context from another chat?
Like I said, I go out of my way to not “lead the witness” and so when the “witness” can peek at other conversations, all my caution is for naught.
I encourage everyone to go read the saved memories in their LLM of choice, I’ve cleaned out complete crap from there multiple times. Actually wrong information, confusing information, or one-off things I don’t want influencing future discussions.
The custom (or rather addition to the) system prompt is all I feel comfortable with. Where I give it some basic info about the coding language I prefer and the OSes that I’m often working with so that I don’t have to constantly say “actually this is FreeBSD” or “please give that to me in JS/TS instead of Python”.
The only thing that has, so far, kept me from turning off memory is that I’m always slightly cautious of going off the beaten path for something so new and moving so fast. I often want to have as close to the “stock” config since I know how testing/QA works at most places (the further off the beaten path you, the more likely you’ll run into bugs). Also so that I can experience when everyone else is experiencing (within reason).
Lastly, because, especially with LLMs, I feel like the people that over customize end up with a fragile systems. I think that a decent portion of the “N+1 model is dumber” or “X model has really gone downhill” is partially due to complicated configs (system prompts, MCP, etc) that might have helped at some point (dumber model, less capability) but are a hindrance to newer models. That or they never worked and someone just kept piling on more and more thinking it would help.
Must be some of the lowest switching costs I've seen which doesn't bode well for OpenAI's consumer revenues...
It's very interesting to learn more about because it challenges 1 core aspect of the economical competition : the moat.
If one can literally swap one AI service for another, then where does the valuation (and the power that comes with it) come from?
PS: I'm not interested in the service itself as I believe the side effects of large scale for-profit are too serious (and I don't mean doomdays AI takeover, I simply mean abuse of power, working conditions, downskilling, political influence as current contracts with US defense are being made, ads, ecological, etc) to be ignored.
That being said, if you have a library of images or some other collection artifacts / assets indexed on their servers that is a different story.
Hearing that starting from a blank slate yields the best outcomes is sort of like hearing extremely wealthy people talk about how money doesn't make you happier.
I would assume both Claude memory and CLAUDE.md work best when they're carefully curated, only containing what you've found yourself having to repeat.
VSCode extension, "Please log in"
I authorize it, it creates an API key, callback. "Hello Claude, this is a test." "Please log in."
So yeah... priorities?
I recent switched from vs code copilot to open code and I kinda miss it. Just selecting text and directly asking the chat. Or seeing the generated code in the ide to accept it reject it. It's neat.
This way you can have Claude distill the memory as you wish.
It's a shame because when Claude is working well it is the best for actual algorithmic coding. There's so much cruft around it now, memories being the most annoying part of that.
80% of the time I just use these things as a sounding board when exploring options and I need responsiveness for that.
Might be time to run my own models.
I find I need to explain I know what I'm talking about first before it gives me non-patronising answers.
It definitely advertises Google services and I would say I hate it. But it's just reliably available. Neither Claude nor ChatGPT are responding at all today.
>I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.
Of course sometimes this is useful if you only use your chatbot to ask personal things like: "What should I eat today?".
But if you use it for anything else you're much better off having full control over the prompt. I can always say: "Hey btw I am german and heavily anti surveillance, what should I know about the recent anthropic DoW situation?" but with memory I lose the option of leaving out that first part.
I am itching at testing claude for assembly coding and c++ to plain and simple C ports.