A Google AI offshoot has created an artificial intelligence (AI) network. DeepMind has taken a giant step forward in solving one of biology’s most challenging problems: identifying a protein’s three-dimensional structure from its amino-acid sequence.
DeepMind’s software, dubbed AlphaFold, outperformed more than 100 other teams in the CASP (Critical Assessment of Structural Prediction) competition, short for Critical Assessment of Structure Prediction. The findings were declared on November 30th, at the start of a virtual conference that takes stock of the exercise.
“This is huge,” says John Moult, a computational biologist at the University of Maryland in College Park. The latter co-founded CASP in 1994, intending to improve computational methods for reliably predicting protein structures. “The dilemma is, in several ways, solved.”
The ability to determine protein structures correctly based on their amino-acid sequence will be a significant blessing to life sciences and medicine. It will significantly speed up attempts to understand the fundamental components of cells, allowing for faster and more sophisticated drug development.
In 2018, the first year, London-based DeepMind participated in the CASP, AlphaFold came out on top. However, this year, the company’s deep-learning network outperformed all other teams and, according to experts, worked so well that it could signal a breakthrough in biology.
Lupas, an evolutionary biologist in the Max Planck Institute for Developmental Biology in Tübingen, Germany, who evaluated the success of various teams in CASP, says, “It’s a game-changer.” AlphaFold has now assisted him in determining the composition of a protein that has perplexed his lab for a decade. He anticipates that it will change the way he performs and the problems he addresses. “Medicine will improve as a result of this. It would affect science. It would affect bioengineering. Lupas says, “It will change everything.”
AlphaFold’s structure predictions were often indistinguishable from those obtained using “gold standard” experimental methods like X-ray crystallography and, more recently, cryo-electron microscopy (cryo-EM). Scientists believe AlphaFold won’t eliminate the need for these time-consuming and costly approaches quite yet, but it will enable them to research living things in new ways.
THE PROBLEM WITH THE STRUCTURE
Proteins are the basic building blocks in life and they are responsible for the majority of what occurs within cells. The 3D form of a protein determines how it functions and what it does — ‘structure is function’ is a molecular biology axiom. Proteins appear to take on their form on their own, driven only by physical rules.
For decades, the most popular method of obtaining successful protein structures has been laboratory studies. Starting in the 1950s, the first complete configurations of proteins were calculated using an X-ray beam aimed at crystallized proteins, and the diffracted light converted into the protein’s atomic coordinates. The majority of protein structures have been generated using X-ray crystallography. On the other hand, Cryo-EM has been the preferred technique of many structural biology labs in the last decade.
Scientists have long been curious about how a protein’s constituent parts — a string of various amino acids — determine the many twists and folds that make up its final form. Researchers claim that early attempts to predict protein structures using computers in the 1980s and 1990s failed miserably. When other scientists extended the techniques to other proteins, the lofty arguments in published articles appeared to fall apart.
Moult established CASP to give these efforts more rigor. The competition requires teams to predict the configurations of proteins that have been solved using experimental methods but have not yet been published. Molt credits the experiment — he doesn’t refer to it as a race — with significantly changing the field by putting a stop to overblown statements. He adds, “You’re just figuring out what looks promising, what succeeds, and what you can avoid.”
Many scientists in the area were taken aback by DeepMind’s results at CASP13 in 2018, which had previously been the domain of small academic groups. Still, according to Jinbo Xu, a computational biologist at the University of Chicago in Illinois, the team’s methodology was mainly close to that of other AI teams.
The first version of AlphaFold used structural and genetic data to predict the difference between pairs of amino acids in a protein using the AI technique known as deep learning. According to John Jumper of DeepMind, who is heading the effort, AlphaFold uses this knowledge in a second phase that does not include AI to develop a “consensus” model of what the protein might look like.
The team attempted to expand on that strategy but ultimately ran into a brick wall. So it took a different approach and created an AI network that included more detail about the physical and geometric constraints that influence how a protein folds. They also gave it a more challenging task: instead of forecasting amino acid associations, the network now has to predict the final structure of a target protein chain. “It’s a much more complicated system,” Jumper notes.
ACCURACY AT THE BEGINNING
CASP takes place over some time. Teams have several weeks to apply their structure predictions after target proteins or portions of proteins called domains — approximately 100 in all — are published daily. The projections are then evaluated by a group of independent scientists using criteria that measure how close a predicted protein is to the experimentally determined structure. The examiners have no idea who is making a prediction.
According to Lupas, AlphaFold’s forecasts came under the name “gang 427,” but the surprising precision of all of them made them stand out. “I guessed it was AlphaFold,” says the narrator. “The vast majority of people had,” he claims.
Few forecasts were higher than others, but almost two-thirds were in line with experimental systems in terms of accuracy. According to Moult, in some situations, it was unclear if the difference between AlphaFold’s forecasts and the practical outcome was due to a prediction mistake or an experiment artifact.
AlphaFold’s estimates were weak matches to laboratory structures determined by a nuclear magnetic resonance spectroscopy method, but this may be due to how the raw data is transformed into a model, says Moult. Specific structures in protein complexes, or classes, are often difficult to model since associations with other proteins distort their shapes.
Overall, teams forecast systems more correctly this year than the previous CASP, but Moult attributes much improvement to AlphaFold. According to Moult, other teams’ best results on relatively challenging protein targets usually scored 75 on a 100-point scale of estimation precision, while AlphaFold scored about 90 on the same targets.
According to Moult, about half of the teams listed ‘deep learning’ in the abstract summarising their strategy, indicating that AI has a broad impact on the ground. The majority of the participants were research teams, but Microsoft and Tencent, a Chinese technology firm, participated in CASP14.
When the DeepMind team introduces its solution on December 1st, Mohammed AlQuraishi, a computational biologist at Columbia University at New York City and a CASP member, is excited to delve into its specifics AlphaFold’s success at the competition and learn more about how the method operates. It’s probable — but doubtful; he notes — that an easier-than-usual crop of protein targets aided the win. AlphaFold, according to AlQuraishi, would be transformative.
“I believe it’s safe to conclude that this would have a significant impact on the field of protein structure prediction. As the central problem has arguably been solved, I think many will leave the field,” he says. “It’s a game-changer of epic proportions, unquestionably one of the most important technological breakthroughs of my lifetime.”
STRUCTURES THAT ARE FASTER
The structure of a bacterial protein that Lupas’ lab has been attempting to crack for years was determined thanks to an AlphaFold prediction. Lupas’ team had previously gathered raw X-ray diffraction results, but converting these Rorschach-like patterns into a structure necessitates some knowledge of the protein’s shape. Other prediction instruments, as well as methods for obtaining this data, had failed. “The model from a group 427 gave us our structure for half an hour after we had spent the decade trying everything,” Lupas says.
“After a decade of trying everything, the blueprint from Group 427 brought us our structure in half an hour,” Lupas says.
DeepMind’s co-founder and CEO, Demis Hassabis, says the organization plans to make AlphaFold beneficial so that other scientists can use it. AlphaFold will take days to generate a forecast structure, which requires predictions of the protein’s reliability in various regions. Hassabis, who sees drug development and protein architecture as possible uses, says, “We’re only beginning to realize what biologists will want.”
The firm published estimates of a few SARS-CoV-2 protein configurations that hadn’t been determined experimentally in early 2020. According to Stephen Brohawn, a molecular neurobiologist at the University of California, Berkeley, whose team published the structure in June, DeepMind’s predictions for a protein named Orf3a ended up being very similar to one later determined by cryo-EM. “What they’ve accomplished is truly remarkable,” he says.
IMPACT IN THE REAL WORLD
AlphaFold is unlikely to close laboratories that use experimental approaches to solve protein structures, such as Brohawn’s. However, it’s possible that low-quality, easier-to-collect experimental data will suffice to produce a good framework. Since the tsunami of available genomic data can now be reliably converted into constructs, specific applications, such as protein evolutionary analysis, are expected to thrive. “A new wave of molecular biologists will be able to ask more advanced questions as a result of this,” Lupas says. “It would necessitate more thought and less pipetting.”
Janet Thornton, a computational biologist at the European Molecular Biology Laboratory-European Bioinformatics Institute in Hinxton, UK, and a former CASP assessor, states, “This is a challenge that I was starting to believe would not get solved in my lifetime.” She hopes that by using this method, she will better understand the role of the thousands of unidentified proteins in the human genome and disease-causing gene mutations that vary between individuals.
The success of AlphaFold is also a watershed moment for DeepMind. The business is best known for using artificial intelligence to master games like Go, but its long-term mission is to create deep, human-like intelligence systems. One of the most critical applications of AI, according to Hassabis, is tackling grand science problems, including protein structure prediction. “I believe it is the most important thing we’ve achieved in terms of real-world impact,” says the author.