I think you are using the wrong language to be honest. LLMs are best at languages like Python, Javascript and Go. Relatively simple structures and huge amounts of reference code. Rust is a less common language which is much harder to write.
Did you give claude code tests and the ability to compile in a loop? It's pretty good in go at least at debugging and fixing issues when allowed to loop.
Refactoring duplicate code into a helper function should be achievable with current agents. To replace existing code with an external crate , you could try giving the agent access to a browser (e.g. playwright-mcp), and instructing it to browse the crate docs. For anything that involves using APIs that may be past the knowledge cutoff for the agent's model, it's definitely worthwhile to have some MCP tools on hand that'll let it browse for up-to-date info - the brave-search and context7 MCPs are good.
{
"servers": {
"context7": {
"command": "npx",
"args": [
"-y",
"@upstash/context7-mcp"
],
"type": "stdio"
},
"fetch": {
"command": "uvx",
"args": [
"mcp-server-fetch"
],
"type": "stdio"
},
"git": {
"command": "uvx",
"args": [
"mcp-server-git"
],
"type": "stdio"
},
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest"
],
"type": "stdio"
},
"brave-search": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-brave-search"
],
"env": {
"BRAVE_API_KEY": "${input:brave-api-key}"
},
"type": "stdio"
}
},
"inputs": [
{
"type": "promptString",
"id": "brave-api-key",
"description": "Brave Data for AI API Key",
"password": true
}
]
}
The Sonnet 4 agent usually defaults to using `fetch` for getting webpages, but I've seen it sometimes try playwright on it's own. It seems the brave-search MCP server is deprecated now, so actually it's probably not the best option as a search MCP (you also need to sign up for an API key), right now it works well though!Have been using it to build a DSL in JS. Greenfield. I’ve followed the commonly touted “plan, act, evaluate” approach; I’ve got it to generate a clear project vision, scope, and feature checklist. Then told it to refer to that for context. I’ve been descriptive and explicit in my prompting, way more so than previously.
It has gotten the broad strokes right, I’ve got an exceptionally barebones DSL, made up of 5 entities, working…just.
It has now started to spin its wheels on small issues and can’t fix them without breaking something else. The codebase isn’t even big (~8 main functions across a few files). Troubleshooting the code is difficult because it’s convoluted and I lack the same intuition for it I would have had I written it myself. I’ve decided to rewrite everything with less control ceded to the LLM.
When it works, it feels great. When it doesn’t, which is often, the spell is broken and I feel I’ve wasted a bunch of time and have not much to show for it.
I've been having fun with Claude Code and VSCode's agent. Any reasonably experienced engineer should be able to use it for a subset of languages without too many issues, but they definitely need to hydrate the context (eg. using Claude.md) and have a sensible set of system prompts set up. Good, well-written and broken-down-into-steps user prompts are non-negotiable.
What helped me was shifting how I use it. I don’t treat it like a junior dev anymore, I treat it more like a second brain. For example:
I use Claude Code to explore options before I commit to a design. I’ll ask it “what are 3 ways to abstract this logic?” and sometimes that alone gives me a better direction.
It’s pretty good at turning rough notes or comments into starter code or test cases. That saves time on boilerplate.
If I feed it a clean, self-contained chunk of code and ask for a targeted change (e.g., “convert this to async”), it often nails it. But yeah, across a codebase, not so much.
Had less luck on generating new features. It's great for prototyping UI but I routinely end up writing it myself.
It's also quick to forget how I like to do things or what libraries and packages it should use. So I either have to keep reminding it or fix up the work myself. While I'm unsure whether it still ends up being quicker, that's really immaterial for me because it absolutely kills the enjoyment of the work.
Current LLM's at least a reasonable percentage of the time still get stuck on race conditions and bugs not obvious via static analysis. If you can explain the exact source of a bug to an LLM they can get it, but if there's a seemingly obvious solution that isn't the correct one, they will try to fix things the wrong way.
It's best to use AI in areas where a lack of specificity or precision isn't a major hinderance, and all abstraction is a closed loop that won't hurt you in the future due to not knowing how it works.
I think we have to build up enough code for it to start appearing like brownfield, before Claude knows how to engineer correctly. Which kind of makes sense if we view Claude Code as a junior engineer with infinite stamina.
I also actually like to spin up Claude Code and Gemini in parallel to see what each one comes up with. Gemini will often do the simpler approach, but not often fully featured, and my solution often ends up taking the 2 solutions and refining in Cursor to come up with the final solution.
It’s useful as an built-in quick docs / search that can spit out small code fragments.
Every time I gave it more space results were disappointing.
I now have 5-10 small services running, whatever "thing" I think I need I create it and self host it.
It's such a revolution.
What language are you using?
> I’m working on a relatively greenfield rust project
I haven't had good luck using LLMs with Rust, but it may just be me.