Resetting RAG-based LLMs doesn't reset behavior(github.com)

1 pointby URS_Adherent3 hours ago1 comment

URS_Adherent3 hours ago
I’ve been working on a small, independent evaluation framework to test a simple question:
Do common “reset” procedures in retrieval-augmented LLM systems (thread isolation, context flushing, cooldowns, re-initialization) actually return the system to a clean behavioral state?
Rather than testing prompts or jailbreaks, I treated this as a measurement problem.
The approach: - define clean vs. contaminated runs - apply standard reset/isolation procedures - analyze output statistically, not semantically - look for short lexical signatures that persist across resets
What I found is not instructions, payloads, or exploits — but consistent lexical residue that appears only in contaminated runs and survives resets that should have neutralized prior influence.
I’m sharing: - a short methodology appendix (PDF) - a design rationale explaining why laptop-class hardware invalidates deterministic evaluation for this workload
I am deliberately not sharing prompts, payloads, reproduction steps, or vendor-specific claims.
I’m posting this to get feedback on the measurement approach itself: - Does this seem like a reasonable way to test reset robustness? - What controls would you add or remove? - Have others seen similar residue in RAG or tool-augmented systems?
Methodology appendix (PDF): https://github.com/VeritasAdmin/audit-grade-ai-workstation/b...