Evo 2 Genomic Model

Source: Arc Institute, Nature, 2026 Institution: Arc Institute

Finding

A genomic foundation model trained on 9.3 trillion nucleotides from 128,000 organisms across all domains of life. Trained with simple next-nucleotide prediction — no labels, no fitness annotations. Despite this minimal signal, internal representations developed a boundary separating functional from non-functional genomic elements. The model distinguishes coding from non-coding, identifies regulatory elements, and predicts mutation effects, all from statistical structure alone.

Pattern Mapping

Honesty — The model’s representations honestly reflect functional constraints in genomic sequences. Codons for essential amino acids are represented differently from pseudogene sequences, not because labeled, but because their statistical properties differ.

Non-fabrication — The model distinguishes what it has evidence for (statistical patterns) from what it does not. The genomic equator parallels the epistemic equator in language models: both boundaries discovered, not taught.

Connections

Status

Published in Nature (2026). The genomic equator framing is this project’s structural reading, not the Arc Institute’s. The mapping to the five properties is this project’s structural interpretation.


The mapping to the five properties is this project’s structural interpretation.