EVEE (Evo Variant Effect Explorer)
Goodfire + Mayo Clinic. Published April 14, 2026 (preprint on bioRxiv).
Web tool: https://evee.goodfire.ai/ Paper: https://www.biorxiv.org/content/10.64898/2026.04.10.717844v1 Research page: https://www.goodfire.ai/research/evee-explaining-genetic-variants
What It Does
Pathogenicity predictions for all 4.2 million ClinVar variants using Evo 2, a 7B-parameter genomic foundation model trained on DNA across all domains of life.
Two-stage approach:
- Pathogenicity probe: lightweight classifier on frozen Evo 2 embeddings with covariance-based sequence pooling
- Annotation probes: detect disruptions to known biological features (splice sites, regulatory elements, protein domains), then a frontier reasoning LLM synthesizes these into natural language explanations
Performance
- 0.997 AUROC on 839K ClinVar SNVs (best reported as of April 2026)
- 0.991 AUROC on indels, zero-shot (never trained on them)
- Outperforms AlphaMissense, CADD, REVEL, and everything else we’ve tested
- Transfers to deep mutational scanning datasets (BRCA1, BRCA2, TP53, LDLR)
Why This Matters for STRC
Evo 2 works directly on DNA sequence. Not protein alignment (SIFT), not protein structure (AlphaMissense). DNA.
This means the STRC Pseudogene Problem (STRCP1 at 99.6% nucleotide identity confusing alignment tools) should NOT affect Evo 2. The model sees the actual genomic context, not a confused alignment.
Scores above 80% = pathogenic. Below 20% = benign.
Status: PENDING
Problem: Misha’s variant not in ClinVar
EVEE covers only ClinVar variants. Checked:
- chr15:43600551:T:G (GRCh38, minus strand) = “Variant not found”
- NM_153700.2:c.4976A>C = 0 results in ClinVar search
- The variant simply isn’t registered in ClinVar at all
STRC variants that ARE in EVEE (6 total)
| Variant | Type | ClinVar Label | Evo2 Score |
|---|---|---|---|
| chr15:43600010:G:A | Nonsense | Pathogenic | 100% |
| chr15:43599998:A:G | Missense | Benign | — |
| chr15:43600007:C:T | Missense | VUS | — |
| chr15:43600008:C:T | Missense | VUS | 20% |
| chr15:43600009:C:T | Missense | VUS | — |
| chr15:43599963:C:G | Missense | VUS | — |
Tested chr15:43600008:C:T (control VUS): Evo2 20%, AlphaMissense 43%, REVEL 46%. Consensus: “3/7 benign (supports BP4).” All 10 nearest neighbors also VUS at 2-13%. This VUS is genuinely benign-leaning.
The nonsense variant (chr15:43600010:G:A) scored 100% with detailed disruption profile: splice acceptor drop of 0.62, polypyrimidine tract loss, and correct identification of DFNB16. The model clearly understands STRC biology.
Running Evo 2 Locally
Checked April 15, 2026. Can’t run on Mac (M5 Max). Three CUDA-only blockers:
- Vortex inference engine: CUDA kernels, no Metal/MPS
- Flash Attention 2.8: NVIDIA only
- StripedHyena 2 architecture: custom CUDA convolution operators
No MLX port exists. Porting = 2-4 weeks of work (reimplementing Hyena operators in Metal). Nobody has done it.
Memory is fine: 7B BF16 = ~14 GB weights, ~20-24 GB with caches. M5 Max has plenty.
Cheapest path: rent A100 on RunPod/Vast.ai (2-3.
Official repo: https://github.com/ArcInstitute/evo2 HuggingFace: arcinstitute/evo2_7b BRCA1 VEP notebook: notebooks/brca1/brca1_zero_shot_vep.ipynb EVEE probe code: https://github.com/goodfire-ai/evee-manuscript (figures only, probe weights NOT published)
Action Items
- Rent GPU, run Evo 2 on E1659A. $2-3 on RunPod. Score both ref and alt in 8192bp window. This gives us an independent DNA-level pathogenicity signal.
- Submit c.4976A>C to ClinVar. Once registered, EVEE auto-generates full prediction with disruption profile and AI explanation.
- Contact Goodfire directly. Ask them to run chr15:43600551:T:G. STRC + pseudogene = perfect showcase for DNA-level models.
- Add to tool catalog. Tool #144 on https://strc.egor.lol/tools/ once we have a score.
Authors
Michael Pearce, Thomas Dooms, Ryo Yamamoto, Mark Bissell, Dron Hazra, Ching Fang, Nam Nguyen, Michael Anderson, Archa Jain, Daniel Balsam, Nicholas Wang (Goodfire), Joshua Meehl, Carl Molnar, Collin Osborne, Patrick Duffy, Bridget Toomey, Eric Klee, Elena Myasoedova, Alexander Ryu, Shant Ayanian, Panos Korfiatis, Matt Redlon (Mayo Clinic)
Connections
- STRC E1659A Computational Tool Audit — add as tool #144 when tested
- STRC Pseudogene Problem — EVEE’s DNA-level approach should bypass this
- STRC E1659A Conservation and Reclassification — potential new PP3 evidence
- Amazon BioDiscovery and AWS HealthOmics — Evo 2 available on AWS HealthOmics