2 pointsby TimAssay6 hours ago1 comment
  • TimAssay6 hours ago
    I built Assay: a Python CLI/SDK that instruments LLM call sites and emits per-call receipts, then bundles them into an Ed25519-signed proof pack. Verification is offline and deterministic.

    Scan study: 30 popular OSS AI repos -- 202 high-confidence direct SDK call sites across 21 repos -- 0 with cryptographically verifiable evidence emission at those call sites. This is not "no logging" -- many have excellent observability. This measures a stricter property: whether a third party can verify the evidence artifact without access to the producer's infrastructure. Scope: Python source scan using direct SDK detection plus framework heuristics.

    Proof run on pydantic-ai (commit-pinned): scan (5 call sites found) -> patch (2 lines auto-inserted) -> run (3 calls exercised) -> verify-pack PASS. https://github.com/Haserjian/assay/blob/280c25ec46afd3ae6938...

    Try it:

        pip install assay-ai
        assay patch .
        assay run -c receipt_completeness -- python your_app.py
        assay verify-pack ./proof_pack_*
    
    Tamper demo (5 seconds):

        pip install assay-ai && assay demo-challenge
        assay verify-pack challenge_pack/good/       # PASS
        assay verify-pack challenge_pack/tampered/    # FAIL (1 byte changed)
    
    Full report + dataset (commit-pinned): https://github.com/Haserjian/assay/blob/280c25ec46afd3ae6938...

    The verifier is open source -- assay verify-pack is deterministic hash + signature checking. Read it, run it, or write your own.

    https://pypi.org/project/assay-ai/

    If I missed your instrumentation or a finding is a false positive, drop a commit link and I'll update the dataset.

    • _wire_5 hours ago
      I find this new paradigm of computing to be troubling: where AI call sites are treated as execution units with vague confidence windows of correctness, say 80%, and fussing over energy efficiency of primitives vs. confidence, to be a surprising and weird development in computing. It looks like a galactic scale (that's an Alan Kay reference) version of the "goto considered harmful" lesson is about to be learned with god knows how much harm incurred upon unsuspecting subjects of the new AI empire. Tolerating human degrees of error rates in processing by system units that can perform millions of times more computations than a human seems like an invitation to Armageddon. But I must not properly understand the scenarios and risks.