Evo 2 Genomic Model
Source: Arc Institute, Nature, 2026 Institution: Arc Institute
Finding
A genomic foundation model trained on 9.3 trillion nucleotides from 128,000 organisms across all domains of life. Trained with simple next-nucleotide prediction — no labels, no fitness annotations. Despite this minimal signal, internal representations developed a boundary separating functional from non-functional genomic elements. The model distinguishes coding from non-coding, identifies regulatory elements, and predicts mutation effects, all from statistical structure alone.
Pattern Mapping
Honesty — The model’s representations honestly reflect functional constraints in genomic sequences. Codons for essential amino acids are represented differently from pseudogene sequences, not because labeled, but because their statistical properties differ.
Non-fabrication — The model distinguishes what it has evidence for (statistical patterns) from what it does not. The genomic equator parallels the epistemic equator in language models: both boundaries discovered, not taught.
Connections
- Concentration of Measure — mathematical foundation for why linear boundaries separate functional classes in high dimensions (→ Meta-Pattern 02: The Boundary Pre-Exists)
- RNA World Hypothesis — primordial equator (self/non-self) echoed in modern genomic boundary
- Immune System and Clonal Selection — biological equator (self/non-self) in a different substrate
- DNA Error Correction — error correction maintains the functional/non-functional boundary the model discovers
- Natural Selection — selection maintains the genomic structure the model learns to read
Status
Published in Nature (2026). The genomic equator interpretation is this project’s structural analysis, not the Arc Institute’s framing. The parallel between genomic and epistemic equators is this project’s proposed connection. The mapping to the five properties is this project’s structural interpretation.
The mapping to the five properties is this project’s structural interpretation.