Evo 2 Genomic Model

Source: Arc Institute, Nature, 2026 Institution: Arc Institute

Finding

A genomic foundation model trained on 9.3 trillion nucleotides from 128,000 organisms across all domains of life. Trained with simple next-nucleotide prediction — no labels, no fitness annotations. Despite this minimal signal, internal representations developed a boundary separating functional from non-functional genomic elements. The model distinguishes coding from non-coding, identifies regulatory elements, and predicts mutation effects, all from statistical structure alone.

Pattern Mapping

Honesty — The model’s representations honestly reflect functional constraints in genomic sequences. Codons for essential amino acids are represented differently from pseudogene sequences, not because labeled, but because their statistical properties differ.

Non-fabrication — The model distinguishes what it has evidence for (statistical patterns) from what it does not. The genomic equator parallels the epistemic equator in language models: both boundaries discovered, not taught.

Connections

Status

Published in Nature (2026). The genomic equator interpretation is this project’s structural analysis, not the Arc Institute’s framing. The parallel between genomic and epistemic equators is this project’s proposed connection. The mapping to the five properties is this project’s structural interpretation.


The mapping to the five properties is this project’s structural interpretation.