Ask HN: How are you using AI code assistants on large messy legacy code bases?

3 pointsby thinkingtoilet6 hours ago8 comments

Someone12345 hours ago
Part of why you're hitting your limit is that Claude's Pro subscription is completely unusable with the current usage limits. I legitimately mean it when I say, you should cancel.
But to the actual question: A lot of people's gut instinct on how to solve this doesn't work. They start going down the road of "well, if I teach the AI about my legacy codebase, it will be smarter, and therefore more efficient." But all you wind up doing is consuming all of your available context, with irrelevancies, and your agent gets dumber and costs more.
What you actually need to do is tackle it the same way a human would: Break it down into smaller problems, where the agent is able to keep the "entire problem" within context at once. Meaning 256K or less (file lengths + prompt + outputs). Then of course use a scratchpad file that holds notes, file references, constraints, and line numbers. That's your compaction protection. Restart the chat with the same scratchpad when you move between minor areas.
Context is your primary-limited resource. Fill it only with what should absolutely need to be there, and nothing else at all.
- thinkingtoilet4 hours ago
  Are you managing your scrathpad file or letting the AI do it? Or both?
  - Someone12344 hours ago
    I have the agent automatically manage its own scratchpad file. But it is meant to be fully disposable; it isn't committed, and is destroyed if you shift areas.
mc7alazoun6 hours ago
I've been in a similar position; where I was tasked to refactor a messy LARGE codebase that was created by a bunch of different previous team members. Here's what I would recommend: - If your codebase contains sensitive or IP related code make sure to remove that before interacting with your AI assistant of choice - Upload the repo/codebase to Claude Code (or codex or whatever you prefer); and ask it to analyse it: the good, the bad and the ugly - Based on the analysis ask the AI assistant to create an .md file with all recommendations/optimisations required to improve code (make sure to have all relevant tests included in the refactoring loop) - Note the before outcomes/outputs; do the refactoring using Claude Code again or whatever tool you prefer; then check after outcomes/outputs to make sure nothing did break
Usually it's an iterative process; if done correctly you could end up with a much better codebase. Good luck!
- thinkingtoilet5 hours ago
  Do you recall how much it cost to review your code base? I worry it's going to be hundreds of dollars just for it to review the thousands of files, let alone organize it all and create suggestions.
  - mc7alazoun5 hours ago
    Not thousands of files in my cases .. The Claude 5x Max plan was enough! I'd suggest splitting your code review into reasonable sized chunks .. don't try to do it all at once.
sdevonoes5 hours ago
Just go the old way. You will save a lot of time, and gain a lot of knowledge.
In my company we have tried using claude for exactly the same task you have. The results were bad. We discovered a few interesting things, but most of the stuff was wrong: we had to dif through the code base the old way to confidently accept/reject what Claude was telling us. So we could have save a lot of time and money simply by doing it ourselves. As an upside we also learned about the codebase so now people rely on us for that (that feels good too)
sds3574 hours ago
I just came across this yesterday, https://github.com/safishamsi/graphify It's supposed to help the agent find the structure and relations in a code base.
4l3x4f1sh3r5 hours ago
Preparation: Let it explore small parts of the code and ask it to write .md documentation. It will pick it up and read it later. Create an AGENTS.md file and put in your coding principles. Let it always update its documentation. Do that until the whole repo is in good shape.
Then for execution: Use plan mode. Let it always write a plan first, check it, correct it and only then allow it to implement it.
Try to break big tasks down in small substeps. As small as possible. Let it implement changes iteratively. Let it do a lot of local git commits. Both Codex and Claude Code use this as documentation as well.
Basically, treat it like a junior developer working under you.
Make sure appropriate tests are written for every code change.
- thinkingtoilet4 hours ago
  I'm using memories for things like coding style and preferences. Is using AGENTS any better?
rocketpastsix5 hours ago
Start small with it. Pick a contained area and have Claude run through it with you. Have it ask you questions and collab with it. Then have it save any output to a document that can be referenced later.
preetigagarwal4 hours ago
[dead]
OutrageousTea4 hours ago
[dead]