Research Article
Diffusion of real versus misinformation during a crisis event: A big data-driven approach

https://doi.org/10.1016/j.ijinfomgt.2021.102390Get rights and content

Highlights

  • Tweets that were novel were more likely to go viral.

  • Tweets with negative sentiment diffused faster than neutral or positive ones.

  • The lower the lexical density, the higher the number of retweets.

  • Association between novelty and retweet count is the same for real vs. misinformation.

Abstract

Misinformation has captured the interest of academia in recent years with several studies looking at the topic broadly with inconsistent results. In this research, we attempt to bridge the gap in the literature by examining the impacts of user-, time-, and content-based characteristics that affect the virality of real versus misinformation during a crisis event. Using a big data-driven approach, we collected over 42 million tweets during Hurricane Harvey and obtained 3589 original verified real or false tweets by cross-checking with fact-checking websites and a relevant federal agency. Our results show that virality is higher for misinformation, novel tweets, and tweets with negative sentiment or lower lexical density. In addition, we reveal the opposite impacts of sentiment on the virality of real news versus misinformation. We also find that tweets on the environment are less likely to go viral than the baseline religious news, while real social news tweets are more likely to go viral than misinformation on social news.

Introduction

Online social networks (OSN) have become an efficient means of information dissemination (Bowler, Halbesleben, Stodnick, Seevers, & Little, 2009). For example, statistics show that in 2017, 67% of US adults depended on online social network platforms such as Twitter, Facebook, and Snapchat for news, as compared to 62% in 2016 (Moon, 2017). Close to 9 out of 10 Twitter users rely on the social media outlet primarily for news, and of those who do, 74% use the platform daily (Rosenstiel, Sonderman, Locker, Ivancin, & Kjarval, 2015). As a result, Twitter has come to replace mainstream media as the number one choice for news, especially among millennials and younger; its reach is expected to grow even further (Stieglitz & Dang-Xuan, 2013). With the advent of social media and portable communication devices, disseminating information during crisis events has become easier and more seamless due to the steady increase in technology adoption (Chaubey and Sahoo, 2021, Dwivedi et al., 2020, Palen et al., 2010, Roy et al., 2020, Shklovski et al., 2010, Shklovski et al., 2008, Sinha et al., 2019). For example, during natural disasters such as hurricanes, information is rapidly propagated through social media platforms using mobile devices and augmented by local eyewitness accounts (Oh, Kwon, & Rao, 2010). Despite its popularity, OSN as an information diffusion channel has an inherent disadvantage. Because users of these platforms act as gatekeepers and are prone to their own individual biases, it is difficult to assess the veracity of the news items being propagated. This issue has been exacerbated even more by the sheer amount of data that flows through online social networks. This phenomenon has led to both factual events and misinformation items on social media reaching a larger audience than news from major mainstream news outlets (Allcott & Gentzkow, 2017). Hence, it is not surprising that headlines of misinformation deceive American adults about 75% of the time, and the most popular misinformation stories usually garner far more shares than authentic news (Silverman & Singer-Vine, 2016).

Extant research has identified salient features that may not only influence the diffusion of information, but also help in assessing its veracity on social media (Boyd, Golder, & Lotan, 2010; Lee, Mahmud, Chen, Zhou, & Nichols, 2014). These features may be broadly grouped into two research themes: information diffusion and information veracity. Information diffusion research has identified three types of features, which include the following: (a) user-based features that directly relate to the behavior and characteristics of users; (b) time-based features related to the time a tweet is generated and posted; and (c) content-based features related to the contents of text embedded in the messages being propagated (Hoang & Mothe, 2018). Information veracity research, which focuses on more proactive measures like detection, relies on the following features: (a) linguistic cues related to the linguistic properties of the messages; and (b) social network characteristics, which predominantly rely on user profile characteristics that can be extracted from the metadata collected in the network (Conroy et al., 2015, Rubin et al., 2015). Both approaches rely on similar features (content- and user-related) but depend on different methods. Information diffusion studies involve several approaches, including a combination of sentiment and text analysis and feature extraction and engineering methods for both predictive and descriptive analysis. Conversely, information veracity predominantly involves the use of prescriptive analysis (Hoang and Mothe, 2018, Rubin et al., 2015, Shin et al., 2018, Suh et al., 2010).

Big data often collated from social media and other repositories provide a wealth of information for information systems (IS) researchers. However, such data come with their own challenges, primarily due to their high volume, veracity, variety, and velocity. This issue has made it increasingly difficult for researchers to not only effectively access the data and ensure its integrity, but also to analyze it. Given the complexity of analyzing Twitter data and the unavailability of complete datasets due to Twitter’s rate limits (https://developer.twitter.com/en/docs/twitter-api/rate-limits), large scale studies using both econometric and algorithmic techniques have also been rare. In addition, much of the extant big data-driven research focuses on describing what phenomena are happening rather than theory building to explain the causality of the observed events (e.g., (Kitchens, Dobolyi, Li, & Abbasi, 2018; Zhou et al., 2018)). Hence, in alignment with Kar and Dwivedi (2020)'s call for more theory building in big data-driven research, we examine the antecedents to tweet virality and how the impacts of these antecedents differ for authentic news vs. misinformation by using data collected from Twitter during a crisis event.

Our research contributes to explaining the inconsistent results from a handful of analyses using Twitter data. For example, a study using textual contents from Twitter showed that both positively and negatively charged tweets were retweeted more often and quicker than neutral ones (Stieglitz & Dang-Xuan, 2013). The researchers concluded that sentiments inferred from social media contents might be positively associated with information diffusion. Other studies have related the propagation of misinformation to automated entities and claimed that these entities actively spread misinformation. They showed that in the earlier phases, the automated entities target mostly influential users on social media that eventually lead to more diffusion of misinformation (Shao et al., 2018). This finding was contradicted by another study that found contrary to conventional wisdom, malicious entities were not any more responsible for the propagation of misinformation than humans (Vosoughi, Roy, & Aral, 2018). Rather, the study claimed that human behavior contributed more to the differential propagation of misinformation than automated entities.

The current study employs a unique dataset of tweets collected across a 5-week period during Hurricane Harvey to empirically investigate the virality of both real and misinformation during a crisis event based on features extracted. Specifically, we investigate the following:

  • What time-based, content-based, and user-based factors affect the virality of authentic news versus misinformation on social media during crisis events?

  • How are the impacts of time-based and content-based factors on virality different for authentic news versus misinformation during such events?

  • How does the virality differ for authentic news versus misinformation in different news categories?

Our empirical results show that virality, measured by the retweet count, is higher for misinformation, novel tweets, and tweets with negative sentiments and those with low readability. In addition, the impacts of sentiment are different for misinformation and authentic news. Tweets on the environment have lower retweet counts compared with the baseline religious tweets, and the retweet counts of social tweets are higher for authentic news than misinformation. Despite the burgeoning literature on misinformation diffusion, this study is the first to address the readability of misinformation by examining its lexical component. Our results show that when a tweet contains more lexical words, users share it less. These findings have implications for research and practice and provide guidelines for administrators in online social networks.

The rest of this study is organized as follows: In the next section, we provide a review of the background literature on misinformation, information diffusion and virality, and information veracity. Next, we provide our conceptual model and develop a set of hypotheses in Section 3. We discuss our approach to data extraction and analysis in Section 4, followed by the results in Section 5 and a discussion of the theoretical contributions and practical implications in Section 6. We conclude by discussing the limitations and implications for future research.

Section snippets

Background literature

Our study draws upon three major streams of research: misinformation, information diffusion and virality, and information veracity. Next, we will discuss each stream as it applies to our research.

Conceptual model and hypotheses

Informed by the information diffusion and information veracity research discussed in the previous section (Agrawal et al., 2013, Conroy et al., 2015; Li et al., 2017), we develop a conceptual model illustrated in Fig. 1 to predict the virality of tweets measured using the retweet count during extreme events. The framework depicts a hybrid approach on how time-based, content-based, and user-based features predict the retweet count.

Big data-driven research method

With the rapid accumulation of data on social media, there is an emerging interest in the IS discipline to repeatedly capture, collate, observe, analyze, condense, store, and visualize relevant information from social media and other online repositories. These activities, which rely on structured and unstructured data, have the potential to create value; they are considered difficult tasks due to the high velocity, variety, volume, velocity, and complexity characterized by big data (Chiang,

Findings

Table 6 summarizes the results of our empirical analyses. All independent variables in our four models had variance inflation factors (VIFs) less than 5. Hence, multicollinearity is not an issue. Our first model depicts the retweet count as a function of the control variables. This is our baseline model. User followers, friends, status, and favorite counts are the zero-inflated variables, as they may influence the probability of users on the network not responding or retweeting the message. We

Discussion

Using data collected during Hurricane Harvey, this research contributes to big data-driven theory building research by examining the factors that affect the virality of authentic news and misinformation on Twitter. By employing predictive analytics, we corroborated the empirical evidence by revealing several features and their interactions as reliable variables in the prediction of virality. Our results show that people are more likely to retweet misinformation as compared with authentic news.

Conclusions

Leveraging studies from information diffusion and information veracity, we examine various factors that influence the virality of news items on Twitter and how they differ for authentic news versus misinformation. Specifically, we show that misinformation, novel, and negatively toned news as well as those with a lower lexical density diffuse more on social media. We provide essential insights about their interactions through a combination of text mining, machine learning and econometrics

CRediT authorship contribution statement

Kelvin K. King: Conceptualization, Methodology, Analysis, Data curation, Visualization, Software, Writing – original draft. Bin Wang: Supervision, Conceptualization, Theorizing, Methodology, Validation, Investigation, Writing – review & editing.

Declaration of Competing Interest

None.

References (92)

  • P. Grover et al.

    Impact of corporate social responsibility on reputation—Insights from tweets on sustainable development goals by CEOs

    International Journal of Information Management

    (2019)
  • T.B.N. Hoang et al.

    Predicting information diffusion on Twitter – Analysis of predictive features

    Journal of Computational Science

    (2018)
  • Y. Hu et al.

    Predicting the popularity of viral topics based on time series forecasting

    Neurocomputing

    (2016)
  • L. Itti et al.

    Bayesian surprise attracts human attention

    Vision Research

    (2009)
  • A.K. Kar et al.

    Theory building with big data-driven research – Moving away from the “what” towards the “why.”

    International Journal of Information Management

    (2020)
  • J. Kim et al.

    Emergency information diffusion on online social media during Storm Cindy in U.S

    International Journal of Information Management

    (2018)
  • T.-H. Lee et al.

    Measuring novelty seeking in tourism

    Annals of Tourism Research

    (1992)
  • C.M. Parra et al.

    Information and communication technologies (ICT)-enabled severe moral communities and how the (Covid19) pandemic might bring new ones

    International Journal of Information Management

    (2021)
  • H.R. Rao et al.

    Retweets of officials' alarming vs reassuring messages during the COVID-19 pandemic: Implications for crisis management

    International Journal of Information Management

    (2020)
  • K.C. Roy et al.

    Understanding the efficiency of social media based crisis communication during hurricane Sandy

    International Journal of Information Management

    (2020)
  • J. Shin et al.

    The diffusion of misinformation on social media: Temporal pattern, message, and source

    Computers in Human Behavior

    (2018)
  • Y. Wang et al.

    Systematic literature review on the spread of health-related misinformation on social media

    Social Science & Medicine

    (2019)
  • M. Agrawal et al.

    Community intelligence and social media services: A rumor theoretic analysis of tweets during social crises

    MIS Quarterly

    (2013)
  • N.I. Alghurair et al.

    Generic frameworks for SVM, ANN, LGBM and LR algorithms

    International Journal of Computer Science and Mobile Computing

    (2020)
  • H. Allcott et al.

    Social media and fake news in the 2016 election

    The Journal of Economic Perspectives

    (2017)
  • G.W. Allport et al.

    An analysis of rumor

    Public Opinion Quarterly

    (1946)
  • S. Aral et al.

    Unpacking novelty: The anatomy of vision advantages

    (2016)
  • R. Aswani et al.

    Experience: Managing misinformation in social media—Insights for policymakers from Twitter analytics

    Journal of Data and Information Quality

    (2020)
  • M. Bene

    Go viral on the Facebook! Interactions between candidates and followers on Facebook during the Hungarian general election campaign of 2014

    Information, Communication & Society

    (2017)
  • J. Berger et al.

    What makes online content viral?

    Journal of Marketing Research

    (2012)
  • J. Berger et al.

    Emotion and virality: What makes online content go viral?

    GfK Marketing Intelligence Review

    (2013)
  • D.M. Blei

    Probabilistic topic models

    Communications of the ACM

    (2012)
  • D.M. Blei et al.

    Latent Dirichlet allocation

    The Journal of Machine Learning Research

    (2003)
  • O. Boichak et al.

    Not the bots you are looking for: Patterns and effects of orchestrated interventions in the U.S. and German elections

    International Journal of Communication

    (2021)
  • M. Bowler et al.

    The moderating effect of communication network centrality on motive to perform interpersonal citizenship author(s)

    Journal of Managerial Issues

    (2009)
  • Boyd, D., Golder, S., & Lotan, G. (2010). Tweet, tweet, retweet: Conversational aspectsof retweeting on Twitter. In:...
  • Chen, X. (2016). The influences of personality and motivation on the sharing of misinformationon social media. In:...
  • J.-J. Cheng et al.

    An epidemic model of rumor diffusion in online social networks

    The European Physical Journal B

    (2013)
  • R.H.L. Chiang et al.

    Special issue: Strategic value of big data and business analytics

    Journal of Management Information Systems

    (2018)
  • C.-H. Chou et al.

    A hybrid attribute selection approach for text classification

    Journal of the Association for Information Systems

    (2010)
  • Chu, Z., Gianvecchio, S., Wang, H., & Jajodia, S. (2010). Who is tweeting onTwitter: Human, bot, or cyborg? In:...
  • Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. In:...
  • N. DiFonzo et al.

    Rumor, gossip and urban legends

    Diogenes

    (2007)
  • F. Di Muro et al.

    An arousal regulation explanation of mood effects on consumer choice

    Journal of Consumer Research

    (2012)
  • Dunn, H. B., & Allen, C.A. (2005). Rumors, urban legends and internet hoaxes. In: Proceedings of the annual meeting of...
  • R. Ehrenberg

    Social media sway: Worries over political misinformation on Twitter attract scientists’ attention

    Science News

    (2012)
  • Cited by (30)

    View all citing articles on Scopus
    View full text