I build my LLM a Brain

2 pointsby Kevintbt9 hours ago3 comments

patrick-elmore7 hours ago
All of these systems that try to solve "the memory problem" seem to fail to justify inserting either a layer of complexity with multiple moving pieces, or an outright blackbox. What is it that makes these systems worth the cost? What is it that they do that provide significantly more value than a structured directory of markdown files, a tuned grep search, and the model you are already using to synthesize the results? If you want to kick it up a notch, abstract the mechanism into a sub-agent to avoid context pollution. I have yet to find a memory system that clearly articulates how it is worth the overhead compared to the simple solution described.
- Kevintbt6 hours ago
  Actually, Karpathy solutions it with RAG system and LLM Wiki but for a consumer app it will be a huge cost incentive. Every time you grep or fullSearch Into the DB or vectors you pay for bandwidth, as a bootstrapper i cannot affort this even with BaaS where they actually bills upfront for traffic. I can understand your point but i a model need to fully read every .md to make a point you'll bloat the context window. Well i'm not a ML research and i'm learning as well, but i don't think it's ideal for a consumer app this way. The fair point is i want to have something like LLM Wiki on my app, maybe if i make some $.
  - patrick-elmore4 hours ago
    It should never read every full file. It should be gripping to find candidates to read, and then read chunks of the file from the hits to see if they are genuinely relevant to whatever you are trying to gather context for. If it reads a chunk of the file surrounding wherever you got the grep hit, and it appears to be relevant, then it can pull in a larger portion or the entire file, if appropriate.
    Kevintbt4 hours ago
    I agree with that but this way it's still bloat, if you are coding with ai you are aware that everytime, a model read 100lines and dont find what it needs to modify you bloat the context. I use copilot this days (until june lol) and there a context window measurement and everytime the model read a file to make a change i assure you the window move from for example 8% to 12% (on gpt 400k tokens) its like 16k tokens for reads for something like 10 lines changes so i know about chunking but this is how it works everytime. You can check how claude code introduced us tools steps deletion to unbload the context window aswell months ago. Thank you for the advices Patrick :)
aiordienow2 hours ago
Context engineering is where the real leverage is right now. Most people focus on model selection but the retrieval and memory layer around the model makes a bigger difference in practice. What's your approach to managing context window limits — chunking with overlap, or some kind of relevance scoring before injection?
jethronethro8 hours ago
But did you give it a heart, too? :-)
- Kevintbt6 hours ago
  I hope you did too ;p