โ† Back to all campaigns

X Thread โ€” Phase 1 Complete

1/6 We spent 4 weeks building an autonomous security scanner for DeFi.

15 specialized scanners. 643 tests. 82.6% detection rate on EVMbench.

Today we stopped building features. Now it hunts. ๐Ÿงต

2/6 The stack:

  • Static analysis (Slither, Semgrep, Aderyn)
  • Dependency scanning (Trivy)
  • Economic exploit reasoning (oracle manipulation, flash loans, LP attacks)
  • Supply chain analysis (forked code, vulnerable imports)
  • Frontend secrets detection
  • Governance risk scoring
  • One command. All 15 scanners. Deduplicated results.

    3/6 The number that matters: 82.6% on EVMbench.

    GPT-5.3-Codex baseline: 72.2%.

    We didn't achieve this with a bigger model. We achieved it by combining traditional static analysis with AI reasoning on the findings. Boring hybrid approach. Works better.

    4/6 The part nobody talks about: false positives kill tools.

    So we built multi-model consensus โ€” Claude, GPT, VulnLLM-R cross-verify each finding. Mutation testing validates detector accuracy. Instant risk scoring (0-100) in under 30 seconds.

    Precision over recall. Every time.

    5/6 Running VulnLLM-R-7B locally via Ollama.

    Cost per scan: $0. Code never leaves the machine. 75-80% detection rate from a 7B model.

    For comparison, a 2-week Claude Opus 4.6 audit of Firefox cost $4,000 in API credits. We run unlimited scans for free.

    6/6 The autonomous loop is operational: 1. Monitor Immunefi for new programs 2. Score and prioritize targets 3. Scan with 15 detectors + AI reasoning 4. Generate fix suggestions + PoC templates 5. Draft reports for submission

    Phase 1 was "build the tool." Phase 2 is "earn bounties."

    Shipping weekly at github.com/gilchrist-research