Please use this identifier to cite or link to this item:
Understand Research Hotspots Surrounding COVID-19 and Other Coronavirus Infections Using Topic Modeling
Dong Mengying.
Cao Xiaojun.
Liang Mingbiao.
Li Lijuan.
Liu Guangjian.
Liang Huiying.
Acceso Abierto
Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a virus that causes severe respiratory illness in humans, which eventually results in the current outbreak of novel coronavirus disease (COVID-19) around the world. The research community is interested to know what are the hotspots in coronavirus (CoV) research and how much is known about COVID-19. This study aimed to evaluate the characteristics of publications involving coronaviruses as well as COVID-19 by using a topic modeling analysis. Methods: We extracted all abstracts and retained the most informative words from the COVID-19 Open Research Dataset, which contains all the 35,092 pieces of coronavirus related literature published up to March 20, 2020. Using Latent Dirichlet Allocation modeling, we trained an eight-topic model from the corpus. We then analyzed the semantic relationships between topics and compared the topic distribution between COVID-19 and other CoV infections. Results: Eight topics emerged overall: clinical characterization, pathogenesis research, therapeutics research, epidemiological study, virus transmission, vaccines research, virus diagnostics, and viral genomics. It was observed that COVID-19 research puts more emphasis on clinical characterization, epidemiological study, and virus transmission at present. In contrast, topics about diagnostics, therapeutics, vaccines, genomics and pathogenesis only accounted for less than 10% or even 4% of all the COVID-19 publications, much lower than those of other CoV infections. Conclusions: These results identified knowledge gaps in the area of COVID-19 and offered directions for future research. Keywords: COVID-19, coronavirus, topic modeling, hotspots, text mining
Appears in Collections:Artículos científicos

Upload archives

File SizeFormat 
1108537.pdf459.38 kBAdobe PDFView/Open