Show HN: Running hallucination detection on a $200 GPU (RTX 3050, 4GB)(github.com)

2 pointsby yubainu5 hours ago2 comments

yubainu5 hours ago
I’ve been exploring why LLMs "break" during inference. Most current hallucination detection methods look at the final text (semantic analysis) or use another LLM to double-check (self-consistency). These are effective but extremely slow and expensive.
SIB-ENGINE is my attempt to solve this at the geometric layer. By monitoring the "Anchor Drift" (how hidden states deviate from the prompt’s latent trajectory), I found that hallucinations often manifest as a structural instability before the token is even sampled.
The Numbers:
Recall: 53.89% (It catches about half, but it's consistent)
Precision: 88.52% (Low false-alarm rate is my priority)
Overhead: <1% (Running on an RTX 3050 with 4GB VRAM)
AUC: 0.8995
I've released a Lite version (1-axis) on GitHub so you can see the fundamental logic and run it on your own machine. I’ve also included the raw_logs.csv from my N=1000 test run on Gemma-2B for full transparency.
I’m particularly curious if anyone here has experimented with similar geometric approaches or has thoughts on how this might scale to 70B+ models where the latent space is significantly denser.
Happy to dive into the technical details!
entrustai3 hours ago
The geometric approach is interesting precisely because it's model-agnostic at the content level — you're detecting structural collapse in latent space before it surfaces as text, which means you don't need to know what a hallucination looks like semantically.
The 54% recall is the honest number to focus on. At 88% precision you're catching real problems when you flag them, but you're missing roughly half of all hallucinations entirely. For a suppression layer in a regulated context that's a meaningful gap — a compliance team can't tell a regulator "we caught most of them."
The complementary approach worth considering: deterministic post-generation checks on the output layer. Geometric drift catches structural collapse during generation. Rule-based output validation catches semantic violations after generation — banned claims, unattributed statistics, absolute guarantees. Neither approach alone is sufficient. Together they cover different failure modes.
Good work publishing the raw_logs.csv. Reproducibility at this layer is rare and matters.
- yubainu3 hours ago
  Thanks for the precise critique. You are right: Recall 54% is the "danger zone." In a regulated or production environment, missing half of the structural collapses is functionally equivalent to zero protection. The 88% precision proves the signal exists, but the threshold for "collapse" in latent space is currently too rigid. The "Geometric approach (SIB) + Rule-based output validation" hybrid you suggested is the most logical path forward. • Geometric Drift (Layer-Internal): Catches the "process" of losing logical coherence (structural entropy). • Rule-based (Output-Layer): Catches the "result" of semantic violations (pre-defined constraints). My next focus is analyzing the "Silent Failures" — the 46% we missed. If the latent space doesn't show geometric collapse but the output is still a hallucination, it suggests the model is confidently drifting into a "parallel" but structurally stable manifold. That's a different failure mode that geometry alone can't catch. Reproducibility is the only way to move this out of "voodoo AI" territory. Glad the raw_logs.csv helped.