But everything your harness looks at could be this. So the skills in your code base, the commands that you've added, the memories that were auto created, they all work towards improving or completely destroying your productivity.
And most of it is hidden. You hear people talk about this all the time where they'll be like, Oh, I use GSD or I use Superpowers and my results have gotten worse.
Your results might have gotten worse precisely because you use them (along with your memories and other skills).
I got myself a Strix Halo system and a GLM coding plan. The qs to self was: what can I do when tokens are essentially unlimited? The opaque-ness of what my harness is, and how it grows over time & use, when using projects out there, makes it hard to know what is helping and what isn't.
Clearly, the harness, together with LLMs has utility. Yet, I can't help but feel that ... at times, I am struggling with the classic "explore-exploit" problem. Or one between having the system be deterministic and when it should be less so. When is my system in a local minima (and needs a good kick out of it, automated if possible), and when it is at a good place in the "global" state-action space.
But obviously if you tell somebody "make a boiled egg. To boil an egg you have to crack it into the pan first." That's a lot worse than "make a boiled egg." Especially when you have an infinitely trusting, 0 common sense executor like an agentic model.
I start a whole lot of my sessions with "Run tests with 'uv run pytest'" and once they've done that they get the idea that they should write tests in a style that fits the existing ones.
A lot of my projects are built with platform versions from the last 12 months which had zero or very small amounts in the core training for the LLM, so they'll tend to avoid using the latest language options unless you prescribe them in AGENTS.
I don't want it in a single global config because I like to stay with the defaults to avoid confusing myself, especially when I'm writing about how coding agents work for other people.
I haven't been able to do without an `.MD` - no agent (CC, Codex, OpenHands) was smart enough to figure out my layout unguided. So much so, a few weeks ago, I had Claude write the guideline below to document the way I like to lay out my tests and modules. I make extensive use of uv workspaces and don't ship tests to production deployments:
```
- uv Workspace Architecture (`uv` v0.11.8+, `packages/` members):
**Build tool:** Exclusively `uv_build`. Never `hatchling` or any other build backend.
Pin as `uv_build>=0.6` in every `[build-system]` block.
**Naming convention — flat, distinct package names (NOT a shared namespace):**
Each workspace member uses a *flat* Python package name that is unique across the workspace.
The `uv_build` backend auto-discovers the module by converting the project name (hyphens → underscores):
`base-constants` → `src/base_constants/__init__.py`
`base-domain` → `src/base_domain/__init__.py`
`base-geometry` → `src/base_geometry/__init__.py`
etc.
No `[tool.uv.build-backend] module-name` override is needed because the project name already maps directly.
**Why NOT a `base.*` namespace package:**
`uv_build` cannot support PEP 420-style namespace packages across workspace members.
It maps each project name to exactly one module root; only one member can own `base/__init__.py`.
Attempting `module-name = "base.constants"` treats the dotted name as a nested directory,
not a namespace — it looks for `src/base/constants/__init__.py`. Confirmed by binary string
inspection of the `uv` binary. NEVER attempt namespace packages with this build backend.
**Import style (locked, never change):**
`from base_constants import CONSTANT_A`
`from base.constants import CONSTANT_A` (namespace layout — abandoned)
**Tests member:** `package = false` in `[tool.uv]`, no `[build-system]` block at all.
Tests are never shipped in production; the member exists solely to isolate test dependencies.
**Microservice split story:** When a member needs to become a standalone repository,
only the `[tool.uv.sources]` entry in the consuming `pyproject.toml` changes
(workspace source → PyPI or VCS source). The package code itself is unchanged.
- *Future-phase features: stub, NEVER implement.* When a feature is explicitly
scoped to a later phase (e.g., "Phase 4"), write a one-line stub that raises
`NotImplementedError` plus a docstring describing the Phase 4 contract. A full
implementation spends tokens on untested code that may never ship in its current
form. Exception: if the full implementation is ≤ 5 trivial lines and directly
validates the current phase's math, implement it outright.```
Similarly, I find it annoying that every agent uses f-strings inside logging calls. Since I added this, that hasn't been a problem:
```
- NEVER use f-strings or .format() inside logging calls. This forces the string to be interpolated immediately, even if the log level (like DEBUG) is currently disabled. You should NEVER do this and if you notice this in existing code, FLAG IT immediately! By passing the string and the variables separately, you allow the logging library to perform lazy interpolation only when the message is actually being written to the logs. It also increases the caridinality for Structured Logging rendering observability useless!
BAD:
```python
# The f-string is evaluated BEFORE the logging level is checked.
# This:
# - wastes CPU cycles if the log level is higher than INFO
# - increases the caridinality for Structured Logging rendering observability useless!
log.info(f"denominator {denominator} is negative!")
```
GOOD:
```python
# The ONLY right way - logging module only merges the variable into the string if
# the INFO level is actually enabled.
log.info("denominator %s is negative!", denominator)
```
Note: Using this "Good" pattern ALSO helps with Structured Logging. Tools like Sentry or ELK can group logs by the template string ("denominator %s is negative!") rather than seeing every unique f-string as a completely different error type.
```Also curious how well LLMs can self-reflect in a loop, in terms of, here's how the previous iteration went, here's what didn't go well, here's feedback from the human, how do I modify the docs I use in a way that I know I'll do better next time.
I know you can somewhat hillclimb via DSPy but that's hard to generalize.
For existing files, the agent will carry on a bad structure unless you specifically ask it to refactor and think about what's actually helpful.
In general, it should be a lean file that tells the agent how to work with the project (short description, table of commands, index of key docs, supporting infra, handful of high-level rules and conventions that apply to everything). Occasionally ask the agent to review and optimize the file, particularly after model upgrades.
Again, the goal is to let the agent know how to work with the project at a high level, not much else. Skills and docs cover the rest.
model: foo-model
max_tokens: 32000
top_p: 1
messages:
- role: system
content: |
You are opencode, an interactive CLI tool that helps users with software engineering tasks.
Use the instructions below and the tools available to you
# ... snip ...
Here is some useful information about the environment you are running in:
<env>
Working directory: /home/user/dir
Workspace root folder: /
Is directory a git repo: no
Platform: linux
Today's date: Tue Apr 28 2026
</env>
Skills provide specialized instructions and workflows for specific tasks.
Use the skill tool to load a skill when a task matches its description.
No skills are currently available.
Instructions from: /home/user/dir/AGENTS.md
# Overview
This directory holds the entirety of the code for the <dayjob> company. All code lives in Github
under the `<dayjob>` organization, and beneath that Organization is a wide-and-flat set of all
the Git repositories of all source code at <dayjob>. That Github repo structure is replicated in
this directory via `ghorg`.
My AGENTS.md file contents start at the "# Overview" line.Notice that the harness is just unceremoniously dumping the AGENTS.md file into the exact same text stream as the system prompt, barely contextualizing that hey, starting now, this text is from AGENTS.md and not from the harness.
If you want AGENTS.md to work (likewise, if you want skills or anything else to work) you have to know how the harness is handling/feeding them to the LLM, because no LLM will reliably look on their own.
Basically a structured context file, that can be used to generate AGENTS.md, and also can be validated and scored.
I think it could help with this problem.
Bonus points if you can force them into context without needing the agent to make a tool call, based on touching the files or systems near them. (my homegrown agent has this feature)
the AGENTS.md pieces that pin specific tool-call shapes or force chain-of-thought before action are coping that ages out, same lifecycle as the retry-with-different-prompt loops or chains of thought prompt most stacks shipped in 2024 to compensate for brittle instruction-following.
not quite there yet, but it's nice to see them being shorter and shorter as model release until all the basic are peeled out by the march of progress and one day only the invariants will be left there
I feel like we've passed the point where an average-effort Claude Code / Cursor / Codex initialized (like basic docs, skills) project would produce a better product (not just code) than if you hired a median programmer to work on that project.
People really do think too highly of themselves.