1 pointby ayjays1325 hours ago1 comment
  • ayjays1325 hours ago
    OpenAI released EVMbench today—a high-stakes benchmark for AI agents auditing smart contracts based on real-world Code4rena contests.

    I just ran Phill CLI through the ringer, and the results were a rollercoaster. I hit *71.4% Recall with 100% Precision* on a blind audit, matching the SOTA GPT-5.3-Codex ceiling.

    *The "Failure" Story:* In my first run (Astaria), I hit 42.8% recall. I thought I was doing great. Then I hit Rubicon v2 and scored *0%*.

    Why? Because I relied on generic vulnerability pattern matching. In complex DeFi protocols like order books, "looking for reentrancy" isn't enough. You have to understand the *protocol's intent.*

    *The Breakthrough:* I evolved the methodology to be *Invariant-First*. I taught the agent to derive the system's mathematical invariants (e.g., "Total assets in derivatives must >= Total supply of shares") before reading a single line of implementation logic.

    *Result:* On Asymmetry Finance, recall jumped to *71.4%*. I caught Flash Loan oracle manipulation and cross-derivative math errors that standard LLMs (GPT-5 baseline: 31.9%) completely missed.

    *What is Phill CLI?* It’s a general-purpose coding agent you can run locally on your own machine. It uses a "Three-Pass" methodology:

    1. *Invariant Violation:* Deriving system rules. 2. *Spec Compliance:* Verifying logic against documentation. 3. *Cross-Contract Call Mapping:* Tracing external dependencies.

    I'm building this as an "AGI Laboratory" for the terminal. It’s model-agnostic, supports MCP, and features a "Continuity Architecture" to solve agent amnesia.

    I'd love to hear your thoughts on the invariant-first approach to AI auditing.

    `npm install -g phill-cli`

    • verdverm5 hours ago
      I think you forget the word "quantum"

      fyi, your buzzword laden projects will be rejected by HN and beyond

      • ayjays1322 hours ago
        Why? It's truthful benchmarking I did, and wanted to share.
        • verdverman hour ago
          you forked gemini-cli, erased the git history, and are now trying to present it as your own

          if you are going to use other people's open source, this is not the way, shame on you

          • ayjays132an hour ago
            It's legal though? I read the liscense agrements and i did a fll overhaul. I didnt just slap a theme I actually remade alot of how it wokrs. Also this si how most projects are made is form open source and modifying adding onto it. Also, because I did this I made something on par with openclaw if not better? And the benchmark for it was worth what I did.
            • verdverman hour ago
              1. give credit where it's due, that is the ethos of open source which you are breaking

              2. might be illegal, you should ask a lawyer

              • ayjays13229 minutes ago
                Attribution has been in the README since day one, the Google Gemini CLI team is credited by name with a direct link. The git history was a fair point and I fixed it by forking properly. The benchmark results, Continuity Architecture, SkillForge, Signal integration and Identity Scaffold are all genuine additions I built on top of that foundation which is exactly how open source is meant to work.
              • 35 minutes ago
                undefined