Genome-wide covariation analysis sheds light on the evolution of SARS-CoV-2

Researchers in the United States have measured genome-wide correlations within true coronavirus respiratory syndrome 2 (SARS-CoV-2) to study potentially unique interactions important in the prevention, diagnosis and treatment of coronavirus 2019 (COVID-19) infection.

When the researchers considered the degree of variability within both different genomes and different virus clades, they found different nucleotide differences between coding regions of the full genome and between different genes.

Evan Cresswell-Clay and Vipul Periwal from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) in Bethesda, Maryland, say that expanding this analysis will provide several research methods.

As the database of SARS-CoV-2 genomes grows, more variability will provide more insight into important interactions within the genome.

In addition, access to this data over time allows for the sharing of data from genome databases that could be used to study the temporary evolution of the virus.

The analysis can also be applied to other diseases where more data are available, the team said.

A pre-printed version of the research paper can be found on the bioRxiv* server, while the article is subject to peer review.

Study: Genome-Wide Co-Evaluation in SARS-CoV-2.  Image credit: NIAID

More about the SARS-CoV-2 genome

The genome of the SARS-CoV-2 virus – the agent responsible for the COVID-19-pandemic virus – was first identified in December 2019.

The genome is approximately 30 kilobases long and contains several open reading frames, including ORF1ab, ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10. These ORFs encode for nonstructural proteins, while specific genomic regions encode four structural proteins, and the largest spike protein.

The spike protein is the surface structure that SARS-CoV-2 uses to bind to and affect host cells. The other three structural proteins comprise the envelope (E) and membrane proteins (M) that make up the viral envelope and the nucleocapsid protein (N) involved in aggregation viral.

EpiCoV database

The design of vaccines and therapies depends on the structure and stability of protein mutations encoded in the ORFs of the genome.

While the reference genome will be used for most studies, a growing body of available data can be used to monitor changes in the genome and analyze microscopy. the virus grew.

This data – collected by GISAID (Global Initiative for the Sharing of Avian Influenza Data) – has allowed various SARS-CoV-2 strains to be recorded in a new database called EpiCoV .

Since the first viral strain entered 10th January 2020, the database has grown to include 292,000 applications.

Now, Cresswell-Clay and Periwal have used 137,636 of these recorded sequences to study the evolution of SARS-CoV-2.

“The differentiation of the genetic structure of the virus is of great medical and biological importance for prevention, diagnosis, and treatment,” the team writes.

Comparative RNA sequence analysis has long been used to study coagulation through co-evaluation of nucleotide mutations. However, separating the indirect and direct interactions that lead to such covariation has been challenging, say the researchers.

What did the current study cover?

The team used an optimization method called Reflection Coupling in conjunction with Direct Coupling Study to quantify the genome-wide covariation within SARS-CoV-2 and to detect direct interactions within the viral genome.

“These interactions can also provide information about protein-protein interactions,” the team writes. “Furthermore, this analysis could be useful in vaccine development, aiding in efforts to reduce‘ escape routes ’for future use of the virus in sequences. “

The team identified genome interactions both within individual coding regions and between different coding regions across the genome.

The ORF1ab and Spike segments showed the greatest difference within the data.

Genome-wide interaction maps also showed the exact location of all available burrows, while interaction maps of individual burrows showed a specific clade-coincidence of nucleotide positioning.

Nucleotide differentiation differed both between coding regions of the whole genome and between different nuclei. Regional events were not consistent between clades, with different differences expressed in individual regions of different cemeteries.

The analysis could help with future research

Cresswell-Clay and Periwal state that future expansion of this analysis could provide a number of research opportunities.

“First, as the database of CoARS-2 SARS genomes grows, the overall frequency and variability will increase, leading to further insights into genome interactions,” wrote the team.

The more available data over time will allow data sharing of genome data and comparison of interaction maps over the temporary evolution of the virus.

“Second, this analysis can be applied to diseases where more data are available, as the SARS-CoV-2-specific genome interaction is not important,” the researchers said.

* Important message

bioRxiv publish preliminary scientific reports that are not peer-reviewed and, therefore, should not be seen as final, guiding health-related clinical practice / behavior, or be treated as information established.

.Source