Research ArticleDiffusion of real versus misinformation during a crisis event: A big data-driven approach
Introduction
Online social networks (OSN) have become an efficient means of information dissemination (Bowler, Halbesleben, Stodnick, Seevers, & Little, 2009). For example, statistics show that in 2017, 67% of US adults depended on online social network platforms such as Twitter, Facebook, and Snapchat for news, as compared to 62% in 2016 (Moon, 2017). Close to 9 out of 10 Twitter users rely on the social media outlet primarily for news, and of those who do, 74% use the platform daily (Rosenstiel, Sonderman, Locker, Ivancin, & Kjarval, 2015). As a result, Twitter has come to replace mainstream media as the number one choice for news, especially among millennials and younger; its reach is expected to grow even further (Stieglitz & Dang-Xuan, 2013). With the advent of social media and portable communication devices, disseminating information during crisis events has become easier and more seamless due to the steady increase in technology adoption (Chaubey and Sahoo, 2021, Dwivedi et al., 2020, Palen et al., 2010, Roy et al., 2020, Shklovski et al., 2010, Shklovski et al., 2008, Sinha et al., 2019). For example, during natural disasters such as hurricanes, information is rapidly propagated through social media platforms using mobile devices and augmented by local eyewitness accounts (Oh, Kwon, & Rao, 2010). Despite its popularity, OSN as an information diffusion channel has an inherent disadvantage. Because users of these platforms act as gatekeepers and are prone to their own individual biases, it is difficult to assess the veracity of the news items being propagated. This issue has been exacerbated even more by the sheer amount of data that flows through online social networks. This phenomenon has led to both factual events and misinformation items on social media reaching a larger audience than news from major mainstream news outlets (Allcott & Gentzkow, 2017). Hence, it is not surprising that headlines of misinformation deceive American adults about 75% of the time, and the most popular misinformation stories usually garner far more shares than authentic news (Silverman & Singer-Vine, 2016).
Extant research has identified salient features that may not only influence the diffusion of information, but also help in assessing its veracity on social media (Boyd, Golder, & Lotan, 2010; Lee, Mahmud, Chen, Zhou, & Nichols, 2014). These features may be broadly grouped into two research themes: information diffusion and information veracity. Information diffusion research has identified three types of features, which include the following: (a) user-based features that directly relate to the behavior and characteristics of users; (b) time-based features related to the time a tweet is generated and posted; and (c) content-based features related to the contents of text embedded in the messages being propagated (Hoang & Mothe, 2018). Information veracity research, which focuses on more proactive measures like detection, relies on the following features: (a) linguistic cues related to the linguistic properties of the messages; and (b) social network characteristics, which predominantly rely on user profile characteristics that can be extracted from the metadata collected in the network (Conroy et al., 2015, Rubin et al., 2015). Both approaches rely on similar features (content- and user-related) but depend on different methods. Information diffusion studies involve several approaches, including a combination of sentiment and text analysis and feature extraction and engineering methods for both predictive and descriptive analysis. Conversely, information veracity predominantly involves the use of prescriptive analysis (Hoang and Mothe, 2018, Rubin et al., 2015, Shin et al., 2018, Suh et al., 2010).
Big data often collated from social media and other repositories provide a wealth of information for information systems (IS) researchers. However, such data come with their own challenges, primarily due to their high volume, veracity, variety, and velocity. This issue has made it increasingly difficult for researchers to not only effectively access the data and ensure its integrity, but also to analyze it. Given the complexity of analyzing Twitter data and the unavailability of complete datasets due to Twitter’s rate limits (https://developer.twitter.com/en/docs/twitter-api/rate-limits), large scale studies using both econometric and algorithmic techniques have also been rare. In addition, much of the extant big data-driven research focuses on describing what phenomena are happening rather than theory building to explain the causality of the observed events (e.g., (Kitchens, Dobolyi, Li, & Abbasi, 2018; Zhou et al., 2018)). Hence, in alignment with Kar and Dwivedi (2020)'s call for more theory building in big data-driven research, we examine the antecedents to tweet virality and how the impacts of these antecedents differ for authentic news vs. misinformation by using data collected from Twitter during a crisis event.
Our research contributes to explaining the inconsistent results from a handful of analyses using Twitter data. For example, a study using textual contents from Twitter showed that both positively and negatively charged tweets were retweeted more often and quicker than neutral ones (Stieglitz & Dang-Xuan, 2013). The researchers concluded that sentiments inferred from social media contents might be positively associated with information diffusion. Other studies have related the propagation of misinformation to automated entities and claimed that these entities actively spread misinformation. They showed that in the earlier phases, the automated entities target mostly influential users on social media that eventually lead to more diffusion of misinformation (Shao et al., 2018). This finding was contradicted by another study that found contrary to conventional wisdom, malicious entities were not any more responsible for the propagation of misinformation than humans (Vosoughi, Roy, & Aral, 2018). Rather, the study claimed that human behavior contributed more to the differential propagation of misinformation than automated entities.
The current study employs a unique dataset of tweets collected across a 5-week period during Hurricane Harvey to empirically investigate the virality of both real and misinformation during a crisis event based on features extracted. Specifically, we investigate the following:
- •
What time-based, content-based, and user-based factors affect the virality of authentic news versus misinformation on social media during crisis events?
- •
How are the impacts of time-based and content-based factors on virality different for authentic news versus misinformation during such events?
- •
How does the virality differ for authentic news versus misinformation in different news categories?
Our empirical results show that virality, measured by the retweet count, is higher for misinformation, novel tweets, and tweets with negative sentiments and those with low readability. In addition, the impacts of sentiment are different for misinformation and authentic news. Tweets on the environment have lower retweet counts compared with the baseline religious tweets, and the retweet counts of social tweets are higher for authentic news than misinformation. Despite the burgeoning literature on misinformation diffusion, this study is the first to address the readability of misinformation by examining its lexical component. Our results show that when a tweet contains more lexical words, users share it less. These findings have implications for research and practice and provide guidelines for administrators in online social networks.
The rest of this study is organized as follows: In the next section, we provide a review of the background literature on misinformation, information diffusion and virality, and information veracity. Next, we provide our conceptual model and develop a set of hypotheses in Section 3. We discuss our approach to data extraction and analysis in Section 4, followed by the results in Section 5 and a discussion of the theoretical contributions and practical implications in Section 6. We conclude by discussing the limitations and implications for future research.
Section snippets
Background literature
Our study draws upon three major streams of research: misinformation, information diffusion and virality, and information veracity. Next, we will discuss each stream as it applies to our research.
Conceptual model and hypotheses
Informed by the information diffusion and information veracity research discussed in the previous section (Agrawal et al., 2013, Conroy et al., 2015; Li et al., 2017), we develop a conceptual model illustrated in Fig. 1 to predict the virality of tweets measured using the retweet count during extreme events. The framework depicts a hybrid approach on how time-based, content-based, and user-based features predict the retweet count.
Big data-driven research method
With the rapid accumulation of data on social media, there is an emerging interest in the IS discipline to repeatedly capture, collate, observe, analyze, condense, store, and visualize relevant information from social media and other online repositories. These activities, which rely on structured and unstructured data, have the potential to create value; they are considered difficult tasks due to the high velocity, variety, volume, velocity, and complexity characterized by big data (Chiang,
Findings
Table 6 summarizes the results of our empirical analyses. All independent variables in our four models had variance inflation factors (VIFs) less than 5. Hence, multicollinearity is not an issue. Our first model depicts the retweet count as a function of the control variables. This is our baseline model. User followers, friends, status, and favorite counts are the zero-inflated variables, as they may influence the probability of users on the network not responding or retweeting the message. We
Discussion
Using data collected during Hurricane Harvey, this research contributes to big data-driven theory building research by examining the factors that affect the virality of authentic news and misinformation on Twitter. By employing predictive analytics, we corroborated the empirical evidence by revealing several features and their interactions as reliable variables in the prediction of virality. Our results show that people are more likely to retweet misinformation as compared with authentic news.
Conclusions
Leveraging studies from information diffusion and information veracity, we examine various factors that influence the virality of news items on Twitter and how they differ for authentic news versus misinformation. Specifically, we show that misinformation, novel, and negatively toned news as well as those with a lower lexical density diffuse more on social media. We provide essential insights about their interactions through a combination of text mining, machine learning and econometrics
CRediT authorship contribution statement
Kelvin K. King: Conceptualization, Methodology, Analysis, Data curation, Visualization, Software, Writing – original draft. Bin Wang: Supervision, Conceptualization, Theorizing, Methodology, Validation, Investigation, Writing – review & editing.
Declaration of Competing Interest
None.
References (92)
- et al.
Stock market response to information diffusion through internet sources: A literature review
International Journal of Information Management
(2019) - et al.
Bayesian LDA for mixed-membership clustering analysis: The Rlda package
Knowledge-Based Systems
(2019) - et al.
Measuring and profiling the topical influence and sentiment contagion of public event stakeholders
International Journal of Information Management
(2021) A means to an end: Using political satire to go viral
Public Relations Review
(2014)Who do you trust? The digital destruction of shared situational awareness and the COVID-19 infodemic
International Journal of Information Management
(2020)- et al.
Assimilation of business intelligence: The effect of external pressures and top leaders commitment during pandemic crisis
International Journal of Information Management
(2021) - et al.
Uncovering sentiment and retweet patterns of disaster-related tweets from a spatiotemporal perspective – A case study of Hurricane Harvey
Telematics and Informatics
(2020) - et al.
Impact of COVID-19 pandemic on information management research and practice: Transforming education, work and life
International Journal of Information Management
(2020) - et al.
Crowd or hubs: Information diffusion patterns in online social networks in disasters
International Journal of Disaster Risk Reduction
(2020) - et al.
Polarization and acculturation in US election 2016 outcomes – Can Twitter analytics predict changes in voting preferences
Technological Forecasting and Social Change
(2019)
Impact of corporate social responsibility on reputation—Insights from tweets on sustainable development goals by CEOs
International Journal of Information Management
Predicting information diffusion on Twitter – Analysis of predictive features
Journal of Computational Science
Predicting the popularity of viral topics based on time series forecasting
Neurocomputing
Bayesian surprise attracts human attention
Vision Research
Theory building with big data-driven research – Moving away from the “what” towards the “why.”
International Journal of Information Management
Emergency information diffusion on online social media during Storm Cindy in U.S
International Journal of Information Management
Measuring novelty seeking in tourism
Annals of Tourism Research
Information and communication technologies (ICT)-enabled severe moral communities and how the (Covid19) pandemic might bring new ones
International Journal of Information Management
Retweets of officials' alarming vs reassuring messages during the COVID-19 pandemic: Implications for crisis management
International Journal of Information Management
Understanding the efficiency of social media based crisis communication during hurricane Sandy
International Journal of Information Management
The diffusion of misinformation on social media: Temporal pattern, message, and source
Computers in Human Behavior
Systematic literature review on the spread of health-related misinformation on social media
Social Science & Medicine
Community intelligence and social media services: A rumor theoretic analysis of tweets during social crises
MIS Quarterly
Generic frameworks for SVM, ANN, LGBM and LR algorithms
International Journal of Computer Science and Mobile Computing
Social media and fake news in the 2016 election
The Journal of Economic Perspectives
An analysis of rumor
Public Opinion Quarterly
Unpacking novelty: The anatomy of vision advantages
Experience: Managing misinformation in social media—Insights for policymakers from Twitter analytics
Journal of Data and Information Quality
Go viral on the Facebook! Interactions between candidates and followers on Facebook during the Hungarian general election campaign of 2014
Information, Communication & Society
What makes online content viral?
Journal of Marketing Research
Emotion and virality: What makes online content go viral?
GfK Marketing Intelligence Review
Probabilistic topic models
Communications of the ACM
Latent Dirichlet allocation
The Journal of Machine Learning Research
Not the bots you are looking for: Patterns and effects of orchestrated interventions in the U.S. and German elections
International Journal of Communication
The moderating effect of communication network centrality on motive to perform interpersonal citizenship author(s)
Journal of Managerial Issues
An epidemic model of rumor diffusion in online social networks
The European Physical Journal B
Special issue: Strategic value of big data and business analytics
Journal of Management Information Systems
A hybrid attribute selection approach for text classification
Journal of the Association for Information Systems
Rumor, gossip and urban legends
Diogenes
An arousal regulation explanation of mood effects on consumer choice
Journal of Consumer Research
Social media sway: Worries over political misinformation on Twitter attract scientists’ attention
Science News
Cited by (30)
Determinants of public emergency information dissemination on social networks: A meta-analysis
2024, Computers in Human BehaviorChanging or unchanging Chinese attitudes toward ride-hailing? A social media analytics perspective from 2018 to 2021
2023, Transportation Research Part A: Policy and PracticeGuest Editorial: Big data-driven theory building: Philosophies, guiding principles, and common traps
2023, International Journal of Information ManagementGoing beyond fact-checking to fight health misinformation: A multi-level analysis of the Twitter response to health news stories
2023, International Journal of Information Management