Message boards : Rosetta@home Science : Microsoft Dayhoff
| Author | Message |
|---|---|
[VENETO] bobovizSend message Joined: 1 Dec 05 Posts: 2154 Credit: 12,892,384 RAC: 3,489 |
DayHoff Dayhoff is an Atlas of both protein sequence data and generative language models — a centralized resource that brings together 3.34 billion protein sequences across 1.7 billion clusters of metagenomic and natural protein sequences (GigaRef), 46 million structure-derived synthetic sequences (BackboneRef), and 16 million multiple sequence alignments (OpenProteinSet). These models can natively predict zero-shot mutation effects on fitness, scaffold structural motifs by conditioning on evolutionary or structural context, and perform guided generation of novel proteins within specified families. Learning from metagenomic and structure-based synthetic data from the Dayhoff Atlas increased the cellular expression rates of generated proteins, highlighting the real-world value of expanding the scale, diversity, and novelty of protein sequence data |
Message boards :
Rosetta@home Science :
Microsoft Dayhoff
©2026 University of Washington
https://www.bakerlab.org