Por favor, use este identificador para citar o enlazar este ítem:
http://conacyt.repositorioinstitucional.mx/jspui/handle/1000/7541
Tracking mutational semantics of SARS-CoV-2 genomes | |
Rohan Singh Sunil Nagpal Nishal Kumar Pinna sharmila mande | |
Acceso Abierto | |
Atribución-NoComercial-SinDerivadas | |
https://doi.org/10.1101/2021.12.21.21268187 | |
https://www.medrxiv.org/content/10.1101/2021.12.21.21268187v1 | |
Genomes have an inherent context dictated by the order in which the nucleotides and higher order genomic elements are arranged in the DNA/RNA. Learning this context is a daunting task, governed by the combinatorial complexity of interactions possible between ordered elements of genomes. Can natural language processing be employed on these orderly, complex and also evolving datatypes (genomic sequences) to reveal the latent patterns or context of genomic elements (e.g Mutations)? Here we present an approach to understand the mutational landscape of Covid-19 by treating the temporally changing (continuously mutating) SARS-CoV-2 genomes as documents. We demonstrate how the analogous interpretation of evolving genomes to temporal literature corpora provides an opportunity to use dynamic topic modeling (DTM) and temporal Word2Vec models to delineate mutation signatures corresponding to different Variants-of-Concerns and tracking the semantic drift of Mutations-of-Concern (MoC). We identified and studied characteristic mutations affiliated to Covid-infection severity and tracked their relationship with MoCs. Our ground work on utility of such temporal NLP models in genomics could supplement ongoing efforts in not only understanding the Covid pandemic but also provide alternative strategies in studying dynamic phenomenon in biological sciences through data science (especially NLP, AI/ML). | |
medRxiv and bioRxiv | |
30-12-2021 | |
Preimpreso | |
www.medrxiv.org | |
Inglés | |
Epidemia COVID-19 | |
Público en general | |
VIRUS RESPIRATORIOS | |
Versión publicada | |
publishedVersion - Versión publicada | |
Aparece en las colecciones: | Artículos científicos |
Cargar archivos:
Fichero | Tamaño | Formato | |
---|---|---|---|
Tracking mutational semantics of SARS COV2 genomes.pdf | 7.79 MB | Adobe PDF | Visualizar/Abrir |