ADRs, Notion docs, and Confluence pages die because they're separate from the code. Out of sight, out of mind.
If you want to be really disciplined about it, set up an LLM-as-judge git hook that runs on each PR. It checks whether code changes are consistent with the existing documentation and blocks the merge if docs need updating. That way the enforcement is automated and you only need a little human discipline, not a lot.
There's no way to avoid some discipline though. But the less friction you add, the more likely it sticks.
I also had a an idea for a solution to this problem long time ago.
I wanted to make a thing that would allow you to record a meeting (in the company I where I worked back then such things where mostly discussed in person), transcribe it and link parts of the conversation to relevant tickets, pull requests and git commits.
Back then the tech wasn't ready yet, but now it actually looks relatively easy to do.
For now, I try to leave such breadcrumbs manually, whenever I can. For example, if the reason why a part of the code exists seems non-obvious to me, I will write an explanation in a comment/docstring and leave a link to a ticket or a ticket comment that provides additional context.
Second, for #3, it's a new hire's job to make sure the docs are useful for new hires. Whenever they hit friction because the docs are missing or wrong, they go find the info, and then update the docs. No one else remembers what it's like to not know the things they know. And new hires don't yet know that "nobody writes anything" at your company.
In general, like another poster said, docs must live as close as possible to the code. LLMs are fantastic at keeping docs up to date, but only if they're in a place that they'll look. If you have a monorepo, put the docs in a docs/ folder and mention it in CLAUDE.md.
ADRs (architecture decision records) aren't meant to be maintained, are they? They're basically RFCs, a tool for communication of a proposal and a discussion. If someone writes a nontrivial proposal in a slack thread, say "I won't read this until it's in an ADR."
IMHO, PRs and commits are a pretty terrible place to bury this stuff. How would you search through them, dump all commit descriptions longer than 10 words into a giant .md and ask an LLM? No, you shouldn't rely on commits to tell you the "why" for anything larger in scope than that particular commit.
It's not magic, but I maintain a rude Q&A document that basically has answers to all the big questions. Often the questions were asked by someone else at the company, but sometimes they're to remind myself ("Why Kafka?" is one I keep revisiting because I want to ditch Kafka so badly, but it's not easy to replace for our use case). But I enjoy writing. I'm not sure this process scales.
We're in the process of trying to get as much stuff as possible into source control (we use google docs a lot, so we'll set up one way replication for our ADRs and stuff from there to git). That way, as LLM models get better, whatever doc gets materialized from those bits and pieces will also automatically get better.
* they may see it as reducing their career security
* they may see it as opening them up to potential prosecution
* it takes a lot of time
A busy engineer trying to hit a deadline is just going to do the easiest thing, aren't they?
Also there is all sorts of tacit knowledge that goes into a decision and I just don't think you are going to capture this automatically.
(I worked on it 25 years ago, rather than for 25 years.)
This is the reason why I take the time to summarize all “why” decisions and implementation tradeoffs being made in my (too lengthy) PR descriptions with links, etc. I’ve gotten into the habit of using <detail/> to collapse everything because I’ve gotten feedback multiple times that no one reads my walls of text. However, I still write it (with short <summary/>s now) because I’ve lost track of the number of times I’ve been able to search my PRs and quickly answer mine or someone’s “why” question. I do it mostly for me because I find it invaluable as I prefer writing shit down instead of relying on my flaky memory. People are forgetful and people come and go. What doesn’t disappear is documentation tied to code commits (well… unless you nuke your repo).
1. Code should be self-explanatory, so should vars, function names and the entire shape be.
2. For the remaining non-obvious bigger design decisions, add a comment header (eg jsdoc) above the main section code block, and possibly refactor it out into its own file. Prefer to have a large comment header (and possibly some inline comments) outlining an important architectural part than having that knowledge dissipate with time, separate external docs and your leaving workers.
"Not maintained" seems kinda weird to me, because at least as I see an ADR, it's like a point in time decision right? "In this situation, we looked at these options, and chose this for these reasons". You don't go back and update it. If you're making a big change, you make a new ADR with your new reasons.
One place I worked did have an interesting idea of basically forcing (not quite) the new hires to take notes on all their onboarding questions/answers as they went and then sticking it in the company docs. It at least meant that incorrect onboarding docs got fixed quickly. Sometimes you had good reasons for stuff, sometimes the reason is "dunno, that's just what we do and it seems hard to change".
Reason being, a lot of this stuff happens for no good reason, or by accident, or for reasons that no longer apply. Someone liked the tech so used it - then left. Something looked better in a benchmark, but then the requirements drifted and now it's actually worse but no one has the time to rewrite. Something was inefficient but implemented as a stop gap, then stayed and is now too hard to replace.
So you can't explain the reasons when much of the time there aren't any.
The non-solutions are:
- document the high level principles and stick to them. Maybe you value speed of deployment, or stability, or control over codebase. Individual software choices often make sense in light of such principles.
- keep people around and be patient when explaining what happened
- write wiki pages, without that much effort at being systematic and up to date. Yes, they will drift out of sync, but they will provide breadcrumbs to follow.
These are all implementation details that shouldn't actually matter. What does matter is that the properties of your system are accounted for and validated. That goes in your test suite, or type system if your language has a sufficiently advanced type system.
If replacing Redis with an in-memory cache is a problem technically, your tests/compiler should prevent you from switching to an in-memory cache. If you don't have that, that is where you need to start. Once you have those tests/types, many of the questions will also get answered. It won't necessarily answer why Redis over Valkey, but it will demonstrate with clear intent why not an in-memory cache.
Sometimes the answer to "why?" is that the dev had a hammer and the codebase was starting to look an awful lot like a nail. In-memory cache isn't considered as a serious option nearly enough imho.
# This previously used ${old-solution}, but has moved to ${new-solution} because ${reason}
Or
# This is ugly and doesn’t make sense, but ${clean-logocal-way} doesn’t work due to ${reason}. If you change ${x} it will break.
Or
# This was a requirement from ${person} on ${date}. We want to remove this, but will need to wait until ${person} no longer needs it or leaves the company.
If there is a Confluence doc that relates to my code, I will usually cross reference it. The Confluence link goes at the top of the file, and a link to the repo goes into Confluence. Even with this, the discovery problem remains, as one of those things needs to be found.
Using chat is a non-starter, as our chats are purged after 6 or 12 months. PRs also seem like a very challenging place to keep the information without a lot of systems in place and strict adherence.
Tickets can work, until the ticketing system changes. I’ve been through 3 ITSM platform changes and 3 changes in agile software. Old information is lost in these transitions as it’s usually only in-flight stuff that migrates. Confluence will meet the same fate soon I’m sure.
At the end of the day, the code is the only thing I can trust to be there. Once the code is gone the information matters less. I also try to be pretty diligent about readme files can get pretty wordy. Adding some kind of architecture doc into the repo might be another option, similar to what claude.md has become for a lot of people. I actually might do this for a project I’m starting now, as it’s pretty confusing… though I’m hoping I can come up with a way to make it less confusing.
GitHub issues templates are perfect for ADR templates. All Hands for engineering is a great place to mention them and for teams to comment on the decision and outcomes.
More: https://max.engineer/reasons-to-leave-comment
Much more: https://max.engineer/maintainable-code
* File issues in a project tracker (Github, jira, asana, etc)
* Use the issue id at the start of every commit message for that issue
* Use a single branch per issue, whose name also starts with the issue id
* Use a single PR to merge that branch and close the issue
* Don't squash merge PRs
You can use `git blame` to get the why.
git blame, gives you the change set and the commit message. Use the issue id in commit message to get to the issue. Issue description and comments provide a part of the story.
Use the issue id, to track the branch and PR. The PR comments give you the rest of the story.
> * Use the issue id at the start of every commit message for that issue
> * Use a single branch per issue, whose name also starts with the issue id
>* Use a single PR to merge that branch and close the issue
To me the noise at the start of every message is unnecessary, and given a lot of interfaces only display 80 chars of the message by default, it's not negligible.
Sometimes, an issue might depend on another issue and contain commits from the other branch. Tagging each commit makes it easier to pinpoint the exact reason for that change.
I have been ignoring Jira's AI summary, but I suppose that could be useful if the comments were very long.
Sorry, not really an answer to your problem. But I feel you, this is a genuinely hard problem.
Keep in mind that, pretty often, the reason something is the way it is comes down to "no real reason", "that seemed easier at the time" or "we didnt know better". At least if you don't work on critical systems.
Conceivably LLMs might be good at answering questions from an unorganized mass of timestamped documents/tickets/chat logs. All the stuff that exists anyway without any extra continuous effort required to curate it - I think that's key.
First: It’s a MCP/cli you can hook up to Claude code and slack, it integrates with GitHub.
the harness lets you record decision as contextual info you can pull up whenever your start a planning session.
It also makes sure your decisions don’t conflict with each other.
I find myself talking through a decision I made months ago with it, update it with any new decisions and it just figures out how to merge everything.
No extra workflow outside of this.
Hot take: hire people that value writing. Create a culture around that.
Oxide is a great example of a company culture that values writing, as shown by their rigorous and prolific RFDs: https://rfd.shared.oxide.computer/rfd/0001
See also: https://oxide-and-friends.transistor.fm/episodes/rfds-the-ba...
Many of these RFDs have hit HN by themselves.
As maligned as it can be, the single best organization I've ever been a part of for code archaeology, on a huge multi-decade project that spanned many different companies and agencies of the government, simply made diligent use of the full Atlassian suite. Bitbucket, Jira, Confluence, Fish Eye, and Crucible all had the integrations turned on. Commits and PRs had a Jira ticket number in them. Follow that link to the original story, epic, whatever the hell it was, and that had further links to ADRs with peer review comments. I don't know that I ever really had to ask a question. Just find a line of interest and follow a bunch of links and you've got years of history on exactly what a whole bunch of different people (not just the one who committed code) were thinking and why they made the decisions they made.
I've always thought about the tradeoffs involved. They were waterfall. They didn't deliver fast. Their major customers were constantly trying to replace them with cheaper, more agile alternatives. But competitors could never match the strict non-functional requirements for security, reliability, and performance, and non-tolerence of regressions, so it never happened and they've had a several decades monopoly in what they do because of it.