Imagine an AI that can read the 3-billion-letter instruction manual of human DNA, spotting hidden patterns even the world’s top geneticists might miss. That’s exactly what researchers have achieved with Evo 2, an open-source artificial intelligence system trained on trillions of DNA bases from across the tree of life. Released just four months after its bacterial-focused predecessor, this breakthrough represents more than just another AI milestone – it’s a fundamental shift in how we understand biology’s most complex code.
From Bacteria to Humans: A Leap in Complexity
The original Evo system, covered in late 2025, excelled at analyzing bacterial genomes where genes cluster together in neat, functional groups. But eukaryotic genomes – including human DNA – are a different beast entirely. They’re filled with introns (non-coding sections), regulatory sequences scattered across hundreds of thousands of bases, and vast stretches of so-called “junk DNA.” As John Timmer notes in Ars Technica, “It’s not clear that this approach will work with more complex genomes.” The Evo team apparently took that as a challenge.
Evo 2’s training involved feeding the system 8.8 trillion bases from all three domains of life – bacteria, archaea, and eukaryotes – using a dataset called OpenGenome2. The researchers trained two versions: one with 7 billion parameters on 2.4 trillion bases, and a full version with 40 billion parameters on the complete dataset. The system learned to recognize everything from protein-coding regions and splice sites to regulatory DNA and even structural features within proteins, all without task-specific fine-tuning.
What Can It Actually Do?
The practical applications are substantial. Evo 2 can detect problematic mutations in genes like BRCA2 (associated with cancer), recognize when genetic codes differ between species, and identify features that tolerate variability, such as RNA splice sites. In some tests, it outperformed specialized software designed for these specific tasks. The researchers suggest it could serve as an automated tool for preliminary genome annotation – essentially, helping scientists understand newly sequenced genomes faster and more accurately.
But here’s where it gets interesting: The original Evo could suggest entirely new bacterial proteins when prompted with related gene sequences. With Evo 2 trained on eukaryotic genomes, could it do the same for more complex organisms? The researchers haven’t fully tested this yet, partly because eukaryotic genes don’t cluster in predictable ways like bacterial genes do. As Timmer explains, “It’s difficult to see how they could even do that test… it’s difficult to guess what functions they should even test for.”
The Privacy Paradox: AI’s Double-Edged Sword
While Evo 2 represents AI’s potential to unlock biological mysteries, other AI developments highlight growing concerns about privacy and misuse. A recent study demonstrates that large language models (LLMs) can deanonymize pseudonymous users across social media platforms with surprising accuracy – achieving up to 68% recall and 90% precision. As co-author Simon Lermen notes, “What we found is that these AI agents can do something that was previously very difficult: starting from free text… they can work their way to the full identity of a person.”
This capability isn’t just theoretical. In one experiment, 7% of 125 participants were identified from questionnaire answers alone. With 10+ shared movies on Reddit, identification rates reached 48.1% at 90% precision. The researchers warn this could enable doxxing, stalking, and hyper-targeted advertising, while governments and corporations could exploit these techniques for surveillance and profiling.
Misinformation in the Age of AI
The privacy concerns intersect with another critical issue: AI-generated misinformation. The Financial Times reports that AI-manipulated satellite images are being widely shared as misinformation during wartime conflicts. An image claiming to show damage to an American radar system in Qatar following an Iranian drone strike was actually an AI-altered image of an area in Bahrain. The post had almost 1 million views and was shared thousands of times.
Brady Africk, an independent open-source intelligence researcher, warns that “AI has made that all tremendously easier and [it] poses a significant threat to people trying to get information online.” Henk van Ess, an expert in online research methods, adds: “The key shift is this: it used to take a state intelligence agency with Photoshop skills to fake a satellite image. Now anyone with access to freely available AI tools can produce something convincing enough to fool casual viewers and move markets.”
Platforms Scramble for Solutions
In response to these challenges, platforms are implementing new policies. X (formerly Twitter) announced it will suspend creators from its revenue-sharing program for 90 days if they post AI-generated videos of armed conflicts without disclosure. Repeated violations lead to permanent suspension. Meanwhile, Apple Music is adding transparency tags to distinguish AI-generated or AI-assisted content, though this relies on labels and distributors voluntarily flagging their use of AI – a system with obvious limitations.
The Bigger Picture: AI’s Infrastructure Dependencies
Behind all these AI developments lies a critical infrastructure question: Who controls the means of production? Nvidia’s remarkable 75% gross profit margin – closer to software companies than traditional chipmakers – depends heavily on Taiwan Semiconductor Manufacturing Company (TSMC) for manufacturing its most advanced AI chips. As the Financial Times analysis notes, “Nvidia’s valuation rests on two conditions holding simultaneously for years. First, demand for AI infrastructure must remain strong enough for Nvidia to sustain premium pricing. Second, TSMC must not claim a larger share of the profits embedded in each chip.”
At current revenue levels, every one-point move in Nvidia’s gross margin represents about $2 billion in annual gross profit. With advanced chipmaking concentrated in Taiwan, any disruption would strengthen TSMC’s leverage by making advanced capacity even scarcer. As the analysis concludes: “In this industry, power follows production capacity. Today, it is held by TSMC.”
What’s Next for Evo 2 and Beyond
Back in the biology lab, the Evo 2 team has made everything open – model parameters, training code, inference code, and the OpenGenome2 dataset. This open-source approach could accelerate discovery, but it also raises questions about potential misuse. The researchers excluded viruses that attack eukaryotes from their training data, concerned the system could be misused to create threats to humans.
The most intriguing possibility? Evo 2 might have identified genome features we don’t even know exist yet. As Timmer speculates: “It remains technically possible that there are features in the genome we’re not aware of yet, and Evo 2 has picked them out.” We’ve discovered CRISPR repeats, microRNAs, and other features over past decades through painstaking research. Now, AI might help us find the next big discovery faster.
What does this mean for businesses and professionals? For biotech and pharmaceutical companies, tools like Evo 2 could accelerate drug discovery and genetic research. For tech companies, the privacy and misinformation challenges highlighted by companion sources represent both risks and opportunities for new solutions. And for everyone, the infrastructure dependencies revealed in the Nvidia analysis show how geopolitical factors can affect even the most advanced technologies.
The release of Evo 2 isn’t just another AI announcement – it’s part of a broader pattern where AI capabilities are advancing faster than our ability to manage their implications. From decoding genomes to deanonymizing users, from creating misinformation to depending on fragile supply chains, we’re seeing both the promise and perils of AI playing out simultaneously. The question isn’t whether AI will transform our world, but how we’ll navigate that transformation responsibly.

