Emerging COVID-19 variants identified with AI

Scientists at the Universities of Manchester and Oxford have developed an AI framework that can track new COVID-19 variants.

The AI framework combines dimension reduction techniques and a new explainable clustering algorithm, called CLASSIX, developed by The University of Manchester. The framework detects new COVID-19 variants, enabling the quick identification of viral genomes that might present a risk in the future.

The study has the potential to support traditional methods of tracking viral evolution, such as phylogenetic analysis. Currently, this requires extensive manual curation.

Enabling a proactive response

Roberto Cahuantzi, a researcher at The University of Manchester and the first and corresponding author of the paper, said: “Since the emergence of COVID-19, we have seen multiple waves of new variants, heightened transmissibility, evasion of immune responses, and increased severity of illness.

“Scientists are now intensifying efforts to pinpoint these worrying new variants, such as alpha, delta and omicron, at the earliest stages of their emergence.

“If we can find a way to do this quickly and efficiently, it will enable us to be more proactive in our response, such as tailored vaccine development and may even enable us to eliminate the variants before they become established.”

© https://phil.cdc.gov/Details.aspx?pid=23312

Challenges with identifying new COVID-19 variants

Like other RNA viruses, COVID-19 has a high mutation rate and short time between generations. This means that it evolves extremely rapidly.

Identifying new strains that are likely to be problematic, therefore, requires considerable effort.

The GISAID database, which provides access to genomic data of influenza viruses, currently has almost 16 million sequences available.

Mapping the evolution and history of all COVID-19 genomes from this data requires large amounts of computer and human time.

Human expert time is limited

The new method allows for the automation of these tasks.

The researchers processed 5.7 million high-coverage sequences in only one to two days on a standard modern laptop.

Existing methods would not be able to do this, as more researchers would need to identify pathogen strains.

Thomas House, Professor of Mathematical Sciences at The University of Manchester, said: “The unprecedented amount of genetic data generated during the pandemic demands improvements to our methods to analyse it thoroughly.

“The data is continuing to grow rapidly, but without showing a benefit to curating this data, there is a risk that it will be removed or deleted.

“We know that human expert time is limited, so our approach should not replace human work altogether but work alongside it to enable the job to be done much quicker and free our experts for other vital developments.”

How does the new method work?

The new method is set to identify new COVID-19 variants by breaking down the virus’s genetic sequences into smaller words, represented as numbers, and counting them.

It then groups similar sequences together based on their word patterns using Machine Learning techniques.

Cahuantzi concluded: “Our analysis serves as a proof of concept, demonstrating the potential use of machine learning methods as an alert tool for the early discovery of emerging major variants without relying on the need to generate phylogenies.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Featured Topics

Partner News

Advertisements

Media Partners

Similar Articles

More from Innovation News Network