Self-Evolving Skill for Claude Code – v3 validation complete(github.com)

1 pointby tiansenxu12 hours ago2 comments

chistev12 hours ago
Why do the new green accounts post about Claude a lot?
- tiansenxu11 hours ago
  Not promotion — I'm the author. This is an ongoing personal project I've been building and validating over the past few weeks. Happy to discuss the technical details.
tiansenxu12 hours ago
A few weeks ago I shared a design pattern I've been building: a governance protocol that lets Claude Code Skills accumulate domain knowledge across sessions without bloating. The core idea is a Five-Gate protocol that controls what gets written into a living knowledge base — the most common outcome of the gates is "do nothing." 63.6% rejection rate in my first experiment. What's new since the last post: I've now completed three rounds of validation on the same database (smart building management, 29 tables). Each round tested progressively more sophisticated capabilities:
v1: Basic Five-Gate protocol — rejection rate, self-correction behavior v2: Confidence decay model — C(t) = C0 × e^(-λ × (β+1)/(α+1) × t) v3: Phase 5 enhancements — entities tagging, search-driven retrieval, hard/soft signal distinction
v3 passed 6/6 verification points. The highlight was T3.3: a user claimed incorrect enum values, and Gate 2 correctly rejected them because they contradicted SQL-verified data already in the knowledge base. The system defended its own knowledge integrity against incorrect human input. v3 also found a critical bug in the inject command — it silently wrote to the wrong path when given a relative --target. Fixed, patched, and verified. What the computation layer looks like now: The confidence math no longer lives in SKILL.md prompts (which caused LLM calculation errors). It's been moved to a Python tool layer: C(t) = C0 × e^( -λ_base × (β+1)/(α+1) × t ) 143 pytest cases passing. LLM judges, Python computes. What this is not: This isn't a finished product. The knowledge base for my test domain has 8 entries after 17 tasks — deliberately sparse. The design philosophy is that a mature Skill that stops growing is healthy, not stalled. Convergence is the goal. What I'm still iterating on: λ calibration across domains requires a second experiment (pending data ethics clearance on production data), the α/β upper bound question is open, and protocol compliance still depends on LLM discipline with no mechanical enforcement. Repo: github.com/191341025/Self-Evolving-Skill Still building. Feedback welcome.