← Back to all campaigns

X Thread — Phase 1 Complete

1/6 We spent 4 weeks building an autonomous security scanner for DeFi.

15 specialized scanners. 643 tests. 82.6% detection rate on EVMbench.

Today we stopped building features. Now it hunts. 🧵

2/6 The stack:

Static analysis (Slither, Semgrep, Aderyn)

Dependency scanning (Trivy)

Economic exploit reasoning (oracle manipulation, flash loans, LP attacks)

Supply chain analysis (forked code, vulnerable imports)

Frontend secrets detection

Governance risk scoring

One command. All 15 scanners. Deduplicated results.

3/6 The number that matters: 82.6% on EVMbench.

GPT-5.3-Codex baseline: 72.2%.

We didn't achieve this with a bigger model. We achieved it by combining traditional static analysis with AI reasoning on the findings. Boring hybrid approach. Works better.

4/6 The part nobody talks about: false positives kill tools.

So we built multi-model consensus — Claude, GPT, VulnLLM-R cross-verify each finding. Mutation testing validates detector accuracy. Instant risk scoring (0-100) in under 30 seconds.

Precision over recall. Every time.

5/6 Running VulnLLM-R-7B locally via Ollama.

Cost per scan: $0. Code never leaves the machine. 75-80% detection rate from a 7B model.

For comparison, a 2-week Claude Opus 4.6 audit of Firefox cost $4,000 in API credits. We run unlimited scans for free.

6/6 The autonomous loop is operational: 1. Monitor Immunefi for new programs 2. Score and prioritize targets 3. Scan with 15 detectors + AI reasoning 4. Generate fix suggestions + PoC templates 5. Draft reports for submission

Phase 1 was "build the tool." Phase 2 is "earn bounties."

Shipping weekly at github.com/gilchrist-research