News

New scientific database ‘could revolutionise the way diseases are treated’

Protein structures representing the data obtained using AlphaFold (Karen Arnott/EMBL-EBI)
Protein structures representing the data obtained using AlphaFold (Karen Arnott/EMBL-EBI)

Solving one of biology’s biggest mysteries has led to the creation of a new scientific database that could revolutionise the way diseases are treated, according to scientists.

Researchers from Google-owned artificial intelligence (AI) lab DeepMind and European Bioinformatics Institute (EMBL), have launched a database featuring 20,000 structures for the complete set of proteins expressed in the human body, known as the human proteome.

They say understanding more about protein shapes could play a key role in the development of novel drugs to treat a wide range of illnesses, from dementia and cancer to infectious diseases such as Covid-19.

The team used DeepMind’s AI program, called AlphaFold, to visualise the structure of proteins. These complex molecules play many critical roles in the human body and are often dubbed the “building blocks of life”.

In addition to the human proteome, the database features 350,000 structures – including 20 additional organisms deemed important for biological research such as E.coli fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria.

Ben Perry, who is discovery open innovation leader at Drugs for Neglected Diseases Initiative (DNDi), one of the research partners using AlphaFold, said: “We need to supercharge the discovery of new drugs for the millions of people at risk of neglected diseases around the world.

We believe this represents the most significant contribution AI has made to advancing scientific knowledge to date

Demis Hassabis, DeepMind founder

“It can be a game changer: by quickly and accurately predicting protein structures, AlphaFold opens new research horizons, improving both the scope and efficiency of R&D (research and development) and facilitating our research in endemic countries.”

The launch of the new database comes after last year DeepMind managed to solve one of the biggest scientific problems that has stumped researchers for half a century: how proteins fold into 3D shapes.

The team used AlphaFold to predict this complicated biological process – making confident predictions of the structural position of 58% of proteins in the human proteome.

Researchers say knowing more about this process is fundamental to understanding the biological machinery of life and could lead to development of new drugs to treat diseases.

Before AI, protein shapes were determined using a method known as crystallography, which according to experts, takes months and years to do.

Demis Hassabis, founder and chief executive of London-based DeepMind, said: “We used AlphaFold to generate the most complete and accurate picture of the human proteome.

“We believe this represents the most significant contribution AI has made to advancing scientific knowledge to date, and is a great illustration of the sorts of benefits AI can bring to society.”

Sharing AlphaFold predictions openly and freely will empower researchers everywhere to gain new insights and drive discovery

Edith Heard

At present, the work is being used to look for potential new medicines for neglected tropical diseases, such as Chagas disease – a potentially life-threatening illness caused by the parasite Trypanosoma cruzi, and leishmaniasis, also caused by a parasite.

Experts are also using AlphaFold’s predictions to develop enzymes that can break down plastics.

EMBL’s deputy director general, Ewan Birney, said the new database of protein shapes contains one of the most important datasets since the mapping of the human genome – a 13-year effort that led to the cataloguing of all the genes of human beings.

He said: “Making AlphaFold predictions accessible to the international scientific community opens up so many new research avenues, from neglected diseases to new enzymes for biotechnology and everything in between.

“This is a great new scientific tool, which complements existing technologies, and will allow us to push the boundaries of our understanding of the world.”

The scientists said their aim is to visualise more than 100 million protein structures that, according to EMBL director general Edith Heard, will be “a revolution for the life sciences”.

She said: “Sharing AlphaFold predictions openly and freely will empower researchers everywhere to gain new insights and drive discovery.”

The research behind the new database is published in the journal Nature.

The Conversation (0)