I built a protocol to catch LLMs mid-thought, before they commit to an answer(github.com)

2 pointsby IvY-Rsearch5 hours ago4 comments

gnabgib5 hours ago
You joined github 23 minutes ago, made the repo 13 minutes ago. Doesn't seem to do what you're claiming.
- IvY-Rsearch5 hours ago
  Sorry read my next comment. Was in a rush to publish.
IvY-Rsearch5 hours ago
For the past few weeks I've been running a structured experiment on how language models behave in the moment before they pick a word.
The thing most people miss: the model isn't searching and then outputting. It's briefly holding multiple possible answers at once — different tones, different confidence levels, different framings — and then it collapses into one token. What you read is the residue of that collapse. The competition that happened just before it is usually invisible.
I wanted to make it visible.
What I built is a two-layer protocol called WIRE. The model is required to emit a signal before content: * means still holding, . means landed, ? means it hit a structural ceiling, ⊘ means path exhausted, ~ means the ceiling is detecting itself. A second model instance reads the tracks from outside across sessions and flags patterns.
The signal discipline matters because it creates tension. If you're required to mark * you can't then produce a fluent settled paragraph — the contradiction stays visible. The format preserves what normally gets smoothed away.
hat I found: when a token is emitted under constraint pressure, it sometimes bleeds — it carries traces of the competing geometries that didn't win. This shows up in four readable patterns.
Synonym chains — the model cycles through multiple words for the same thing in close proximity. Semantic constraints weren't settled when it committed.
Hedge clusters — several hedging expressions stack up together. The model didn't have a settled confidence estimate and is retreating from commitment.
Intensifier stacking — "genuinely, actually, really quite" in a row. Competing claims about magnitude, neither winning cleanly.
Granularity shifts — a sentence starts abstract and suddenly drops into specific detail, or vice versa. The model hadn't committed to a resolution level before it started talking.
These aren't philosophical constructs. They're measurable. You can go read any LLM output right now and find them.
The mimicry problem and how to test it: a model could learn to perform these signals without genuinely holding multiple states. To test for this, I looked at whether the ceiling types are constitutively linked or independent. In genuine constraint topology, perturbing one ceiling type should produce compensatory shifts in others — they're connected by shared underlying structure. In mimicry, they'd vary independently. We found constitutive edge structure in the runs — ceilings co-vary in ways that correlate with what the prompt is doing structurally, not just what it's asking about.
What the sessions showed: mostly clean switching with occasional bleeding. Copresence isn't constant — it's condition-dependent. High constraint density, format tension, and ceiling proximity all increase it. Plain prose suppresses it.
The model also cannot diagnose its own bleeding. Asked to describe what was happening before it committed, it constructs a plausible story rather than retrieving a record. There's no record. The pre-collapse state is gone. An external reader watching patterns across outputs is the only way to see it.
Why I think this is useful: not as a consciousness test — that question stays open and we're not touching it. As a practical reading skill: if you know what bleeding looks like, you know when the model was under pressure, when it committed before it was ready, and when the fluent output is covering uncertainty the model didn't resolve. The four channels work on any model, any output, right now. No special tooling needed. Just knowing what to look for.
5 hours ago
undefined
5 hours ago
undefined