Actually, Karpathy solutions it with RAG system and LLM Wiki but for a consumer app it will be a huge cost incentive. Every time you grep or fullSearch Into the DB or vectors you pay for bandwidth, as a bootstrapper i cannot affort this even with BaaS where they actually bills upfront for traffic. I can understand your point but i a model need to fully read every .md to make a point you'll bloat the context window. Well i'm not a ML research and i'm learning as well, but i don't think it's ideal for a consumer app this way. The fair point is i want to have something like LLM Wiki on my app, maybe if i make some $.
It should never read every full file. It should be gripping to find candidates to read, and then read chunks of the file from the hits to see if they are genuinely relevant to whatever you are trying to gather context for. If it reads a chunk of the file surrounding wherever you got the grep hit, and it appears to be relevant, then it can pull in a larger portion or the entire file, if appropriate.
I agree with that but this way it's still bloat, if you are coding with ai you are aware that everytime, a model read 100lines and dont find what it needs to modify you bloat the context. I use copilot this days (until june lol) and there a context window measurement and everytime the model read a file to make a change i assure you the window move from for example 8% to 12% (on gpt 400k tokens) its like 16k tokens for reads for something like 10 lines changes so i know about chunking but this is how it works everytime. You can check how claude code introduced us tools steps deletion to unbload the context window aswell months ago. Thank you for the advices Patrick :)