Preprint
Article

Assessing the Predictive Power of Online Social Media to Analyze COVID-19 Outbreaks in the 50 U.S. States

This version is not peer-reviewed.

Submitted:

02 June 2021

Posted:

03 June 2021

You are already at the latest version

A peer-reviewed article of this preprint also exists.

Abstract
As the coronavirus disease 2019 (COVID-19) continues to rage worldwide, the United States has become the most affected country with more than 34.1 million total confirmed cases up to June 1, 2021. In this work, we investigate correlations between online social media and Internet search for the COVID-19 pandemic among 50 U.S. states. By collecting the state-level daily trends through both Twitter and Google Trends, we observe a high but state-different lag correlation with the number of daily confirmed cases. We further find that the predictive accuracy measured by the correlation coefficient is positively correlated to a state’s demographic, air traffic volume and GDP development. Most importantly, we show that a state’s early infection rate is negatively correlated with the lag to the previous peak in Internet search and tweeting about COVID-19, indicating that earlier collective awareness on Twitter/Google correlates with lower infection rate. Lastly, we demonstrate that correlations between online social media and search trends are sensitive to time, mainly due to the attention shifting of the public.
Keywords: 
;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

152

Views

266

Comments

0

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.

Email

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated