Every time we ask the model to help build out this system, it tries to write a Python script instead. It cannot stop itself from reaching for "real code." The idea that the orchestration layer is just structured text that another LLM reads is somehow alien to it, even though that's literally how it works.
It's like asking a native speaker to teach their language and they start building Duolingo
Firstly, tools like Claude Code already have things like auto memory.
Secondly, I think we all learned by now that agents will not always reliably follow instructions in the AGENTS.md file, especially as the size of the context increases. If we want to guarantee that something happens, we should use hooks.
There are already solutions to track what the agent did and even summarising it without affecting the context window of the agent. Tools like Claude Code log activity in a way which is possible to analyze, so you can use tools to process that.
When I tried something similar in the past, the agent would not really understand what is important to "memorise" in a KNOWLEDGE.md file, and would create a lot of bloat which I would then need to clean up anyway.
There are existing tools to tell the agent what has happened recently: git. By looking at the commit messages and list of changed files, the agent usually gets most of the information it needs. If there are any very important decisions or learnings which are necessary for the agent to understand more, they should be written down >manually< by a developer, as I don't trust the agent to decide that.
Also, there is an ongoing discussion about whether AGENTS.md files are even needed, or whether they should be kept to an absolute minimum. Despite what we all initially thought, those files can actually negatively affect the output, based on recent research.
Agent-kernel has personality, persistent memory, self-modifying capability, using Skills is same as using Skills from Claude code.
I already have to fight the agent constantly to prevent it adding backwards compatibility, workarounds, wrappers etc. for things that I changed or removed. If there's even one forgotten comment that references the old way, it'll write a whole converter system if I walk out of the room for 5 minutes, just in case we ever need it, and even though my agents file specifically says not to (YAGNI, all backwards compatibility must be cleared by me, no wrappers without my explicit approval, etc.). Having a log of the things we tried last month but which failed and got discarded sounds like a bad idea, unless it's specifically curated to say "list of things that failed: ...", which by definition, an append only log can't do.
I have hit the situation where it discovered removed systems through git history, even. At least that's rare though.
---
Only documentation to write is project.md and TODO.md do not write documentation anywhere else.
TODO.md: document gaps, tasks and progress, grouped by feature
project.md: document architecture, responsability map, features and the tribal knowledge needed to find things
Do not document code, method, classes.
STANDARD OPERATING PROCEDURES:
Beginning of task:
- read: goals.md tech.md project.md
- update TODO.md add step by step [ ] tasks under the # feature you will implement
During execution of task:
- perform the task step by step, delegate if possible to sub tasks or sub agents
- log with [x] the work performed in TODO.md as you go
End of task:
- remove completed features from the TODO.md
- maintain project.md
Dude, this is just prompts. It is as useful as asking claude code to write these files itself.
bigbezet is right, agents have no clue what's worth remembering. What works for me is splitting it: the agent writes what happened, I decide what actually matters. Two places to manage: journal and the STATE.MD, which I request to maintain based on my expectations. Agent can read a journal if it needs, but the main place to check the status is STATE.md.
One thing I haven't seen anyone mention, though. After a few weeks of reading your rants about some coworker, the agent just takes your side on everything. Had to literally add "consider the other person's perspective" to my rules file. It just has too many one-sided notes in the journal. Otherwise you end up with a yes-man that has perfect memory.
The trauma replay thing gaigalas mentioned is real too. I found it hard to not make agent be biased. To be frank, even I'm noticing something like this: - I complain, agent defends me. - I'm putting into the chat a response from other llm which was not biased by my journal. It flips sides and now says the research makes much sense. - I say: "How much biased you are right now." and it responds something about being biased and "... to be frank, the truth is: ...". Even when asking for not being biased, is starts to play biased because it thinks I expect that. Sneaky bastard.
I was stuck on a task for a couple of days. Deleted the memory about some debugging sessions, thing just unlocked itself again. The harness was basically replaying the trauma over and over again.
I honestly think it's better to not have stateful stuff when working with agents.