LLMs audit code from the same blind spot they wrote it from. Here's the fix

3 pointsby brodeurmartin4 hours ago2 comments

chunpaiyang4 hours ago
I'm a software engineer building an app as a side project. I quickly realized AI bullshits a lot.
But you know, engineers bullshit each other all the time too. The difference is we have a way to verify it - logical chains. You have to build an argument that holds up before anayone buys in it.
So I though, can I make AI build its own logical chain ? Let it pass its own logic check before telling me the result.
That's how I created my own "think" skill. It's based on Meta's CoT paper: https://arxiv.org/abs/2501.04682
It roughly works like this: 1. FRAME - Challenge the question itself, hidden assumptions.
2. GROUND - Map what you know, what you need, what's missing.
3. ASSOCIATE - Launch multiple independent agents in parallel to generate hypotheses, avoid anchoring bias.
4. VERIFY - Break each hypothesis into atomic claims, verify each independently
5. CHAIN - Build a logical chain from survivors
6. PROVE and LOOP - Walk backwards from conclusion to premises, seearch for evidence, repair if broken
7. DELIVER - Start with "I was wrong if ...."
It helps me a lot. Whenever I need to check if Claude Opus 4.6 is bullshitting me. I just say "/think verify the above reasoning is correct" or "/think verify the above fix is correct and complete."
- brodeurmartin3 hours ago
  [dead]
formrecap4 hours ago
The concept of blind spots in same-model auditing is sound, but I'm skeptical that just adding "orthogonal" to a prompt solves it. Which axis was the model using before? Which should it use next? Without knowing that, you're just hoping for variety.
What actually works in my experience is two things:
First, prompting with specific personas. "You are a security auditor looking for multi-tenant isolation failures" unlocks genuinely different reasoning from "review this code." The lens matters more than the word "orthogonal" — it gives the model a concrete perspective to reason from.
Second — and I think this gets overlooked — anchoring AI review in deterministic tooling. Semgrep, ESLint, dependency audits. These tools have been catching bugs reliably for years. A model asked to "review this code" will always find something — they're trained to be helpful, I've never had one say "nope, it's perfect." But pairing that with deterministic tools gives you consistency and catches the things models miss by construction.
It's not really new. It's just working with AI agents the way you'd work with another team member — while knowing their limitations (like regurgitating semantically similar ideas when asked the same question twice).
- brodeurmartin3 hours ago
  [dead]