Por favor, use este identificador para citar o enlazar este ítem: http://conacyt.repositorioinstitucional.mx/jspui/handle/1000/7541
Tracking mutational semantics of SARS-CoV-2 genomes
Rohan Singh
Sunil Nagpal
Nishal Kumar Pinna
sharmila mande
Acceso Abierto
Atribución-NoComercial-SinDerivadas
https://doi.org/10.1101/2021.12.21.21268187
https://www.medrxiv.org/content/10.1101/2021.12.21.21268187v1
Genomes have an inherent context dictated by the order in which the nucleotides and higher order genomic elements are arranged in the DNA/RNA. Learning this context is a daunting task, governed by the combinatorial complexity of interactions possible between ordered elements of genomes. Can natural language processing be employed on these orderly, complex and also evolving datatypes (genomic sequences) to reveal the latent patterns or context of genomic elements (e.g Mutations)? Here we present an approach to understand the mutational landscape of Covid-19 by treating the temporally changing (continuously mutating) SARS-CoV-2 genomes as documents. We demonstrate how the analogous interpretation of evolving genomes to temporal literature corpora provides an opportunity to use dynamic topic modeling (DTM) and temporal Word2Vec models to delineate mutation signatures corresponding to different Variants-of-Concerns and tracking the semantic drift of Mutations-of-Concern (MoC). We identified and studied characteristic mutations affiliated to Covid-infection severity and tracked their relationship with MoCs. Our ground work on utility of such temporal NLP models in genomics could supplement ongoing efforts in not only understanding the Covid pandemic but also provide alternative strategies in studying dynamic phenomenon in biological sciences through data science (especially NLP, AI/ML).
medRxiv and bioRxiv
30-12-2021
Preimpreso
www.medrxiv.org
Inglés
Epidemia COVID-19
Público en general
VIRUS RESPIRATORIOS
Versión publicada
publishedVersion - Versión publicada
Aparece en las colecciones: Artículos científicos

Cargar archivos: