1 Introduction

In recent years, supply chain networks have evolved from simple sequential and linear networks to dynamic processes that require information sharing and visibility (Orenstein 2020). Current trends point to a rebrand of the traditional “product-oriented” supply chains whose primary focus is on productivity and asset utilization to “service-oriented” supply chains which are more intangible in nature and focus on management of expertise, capacity and knowledge. Arguably, the purpose of supply chain management is to better serve upstream and downstream entities while simultaneously increasing service levels (Chang et al. 2013). This requires seamless integration amongst entities to reduce data transmission time thereby reducing supply lead time and improving service performance (Chang et al. 2013).

The shift of focus from traditional product-oriented supply chains to service-oriented supply chains (Lo 2016) has prompted a focus on the utilization of big data applications for greater information dissemination. Characterized by seamless connectivity among members of the value chain, digital connectivity has given rise to service dominant logic that allows for the rapid customization of both goods and services (Lee and Lee 2019). However, stakeholder reaction to big data applications appear to be mixed, impacting not only the use but the economic value added to the supply chain network. While big data applications in service-oriented supply chains have shown improvements in tracking and tracing shipments, the security and the lack of skills in analytics and management of the data continue to foster mixed sentiment (Neuman 2014). Understanding ways to control this sentiment is paramount to the application and subsequent benefit for the management of service and reduction of variability in supply chain management. One unexplored avenue in which to control for sentiment is through corporate media (CM).

In recent years a proliferation of CM that reference big data in supply chain management provides paramount evidence of both the benefits and drawbacks. Traditionally, data were captured through business transactions involving various stakeholders and processes. In the new era of big data digital transformation, the combination of size, complexity, and advanced analytics has improved applications for national security, marketing, credit risk, medical research, and urban planning applications (Martin 2015). The use of big data analytics in corporations has led to higher financial, market and customer satisfaction performance (Vitari and Raguseo 2019). From a supply chain perspective, products are tagged with sensing devices for tracking and tracing purposes. This includes leveraging large volumes for data to enhance coordination within the supply chain network leading to more efficient communication, agile response to change, and improved operations (Kuo and Kusiak 2019). Digital transformation has also allowed for customer input to delivery and customization thereby enhancing their role as no longer recipients of services but also contributors toward improved service quality (Ahn and Rho 2016). The application of both rapid and large-scale data collection and inter-organizational dissemination of data promises to provide supply chain partners with competitive business intelligence for strategic decision-making and enhance consumer service experience through greater responsiveness (Lau et al. 2017).

The potential to harness the power of big data for supply chain service value is enhancing the “hype” surrounding big data in the media. The continuous growth of data and organizational need to leverage have caused stress on the traditional information systems and techniques, thereby leading to insufficient utilization (Gupta et al. 2019). Thus, the reaction toward big data appear to be “mixed”. Big data have been criticized as a breach of privacy. Both Amazon and Orbitz have been flagged with price discrimination. In addition, the tracking of movements and shopping habits of stadium spectators utilizing Verizon’s Precision Marketing Insight program has been assessed as a “questionable use” of big data (Martin 2015). While collaboration and inter-company communication have never been easier, the platforms that give rise to the processing of information, documents, and key data continue to perpetuate security concerns, especially in pharmaceutical, healthcare, and financial supply chains where data stored can be sensitive. Any perceived unethical use of big data continues to impact sentiment through a lack of understanding of how data will be obtained and shared among supply chain partners. CM on hackers seeking insider trading information from insurance companies, like Anthem Inc. and the unethical use of data from Facebook partners, continue to impact “stakeholder sentiment” (i.e., the cumulative sentiment of stakeholders responding to a particular document).

Calls for research underpin the importance of behavioral research in examining incentive structures, change mechanisms, and social expectations to motivate the use of big data applications (Motiwalla et al. 2019). The organizational change affects organizations internally (and externally) through increasing uncertainty, anxiety, stress, and ultimately resistance, which are critical to the success or failure of change programs (Shah et al. 2017). Organizations have the capability to adjust for changes through altering culture which impacts attitudinal characteristics of internal and external stakeholders, thus impacting readiness or resistance (Shah et al. 2017). One method of impacting culture is through the use of media. Specific mentions of message framing have been suggested as potential impactors of individual attitude (Kohli and Tan 2016). Understanding sentiment surrounding big data in supply chain management, and how to control for sentiment through CM, will aid in achieving supply chain value from such applications. Social media have been discussed as a strategic approach for the further enhancement of knowledge between individuals and organizations, not only impacting organizational learning and dialogue, but in return yielding competitive advantages for the firm utilizing big data to share information throughout the supply chain (Lam et al. 2016). Yet organizations still lack a general understanding of how to craft content such that they can effectively connect and enhance stakeholder sentiment to reduce concerns, thereby mobilizing effective use of big data (Williams et al. 2018).

Thus, this study focuses on CM documents and their respective impact on stakeholder sentiment regarding big data in service-oriented supply chains. Currently, there exists minimal insight into stakeholder sentiment in this area. This study utilizes the collection of textual data from two sources of media (video and practitioner media) which reference corporate communications through a company representative. Through the utilization of a web-scraper, social network post data are collected specifically to analyze stakeholder sentiment of CM on big data in supply chain management. Using sentiment analysis and controlling for various demographic factors, this study determines the associated impact of CM documents on stakeholder sentiment. Unlike previous studies that utilize social network posts to examine cumulative sentiment from media using keyword searches, this study provides causal inference by first conducting a search of CM including news and video media and then a social network post web scrape to collect social network posts that reference the CM document while controlling for time.

This study finds that CM documents that mention big data are positively associated with stakeholder sentiment while CM documents that do not mention big data are not significantly associated with stakeholder sentiment. Additionally, CM documents that discuss future applications of big data are negatively associated with stakeholder sentiment.

The paper is organized as follows: Sect. 2 will present a literature review of big data in supply chain management and how it is projected through media and reciprocal stakeholder sentiments. Section 3 will discuss how the roles of technology adoption theory coupled with agenda setting and a review of psychological bias address how CM documents impact stakeholder sentiment of big data. Section 4 presents the methodology and results of our findings from a regression analysis utilizing demographic variables, topic discussions, and other control variables to analyze the research question as well as post-hoc analyses which discuss actual results from a review of social network posts. Finally, Sect. 5 provides a discussion of the findings as they relate to managerial implications on CM and theoretical implications to inform corporate discussions of big data. Following this are concluding remarks and future directions of this research.

2 Literature review

In this study, we analyze the “stakeholder” perception of big data in CM documents. Borrowing from Freeman (1984), stakeholders can be defined and categorized into three broad groups with equity stakes, economic (or market stakes), and influencer stakes. This model takes into account the majority of individuals who will respond to CM documents in media. They might either be members of the economic or market stakes in that there is an economic interest but no ownership interest in the organizations (i.e., employees, customers, suppliers, and competitors). They might also be an influencer who are individuals who do not have an ownership or economic interest but who have interests as consumer advocates, environmental groups, etc. They might also belong to the group that comprises of individuals with direct ownership (i.e., shareholders, stockholders, minority interest owners).

2.1 Big data

Distinctive characteristics of the service focus (i.e. intangibility, simultaneity, heterogeneity and perishability) in supply chain management suggest a vulnerability toward greater variability (Lo 2016) requiring organizations to restructure accordingly. Big data mark the rise of exploration and utilization of large data sets through the application of advanced statistics to stored electronic communication (Gupta et al. 2012; McAfee et al. 2012; Kache and Seuring 2017). Typically characterized by the four V’s: volume, velocity, variety and veracity (Goes 2014), recent research has added a fifth “v”, value, which incorporates not only the integration of different large data sources and the validity of such data but also the understanding and appreciation of the dual nature of prediction and causality in big data (Baesens et al. 2016). Velocity, as defined by a firm’s ability to process data at high speeds, bears the greatest potential for performance improvements (Hofmann 2017) particularly useful for service productivity (Ahn and Rho 2016).

Recent events including a global pandemic (COVID-19) have prompted organizations to rethink current strategies focused on developing robust supply chain networks. With a new need to develop and understand the optimal restocking of critical products, organizations seek to utilize advanced information communication technologies and digital systems. Previous research elucidates this through applications and performance implications of big data adoption from a global perspective (Gawankar et al. 2020). But the adoption of big data begins with the individual and/or user perceptions of the technology.

2.2 Application of big data in firms

Organizations have faced both opportunities and challenges when seeking to enhance individual value through big data in supply chain management (Setia and Patel 2013). Big Data analytics are increasingly used to improve business outcomes as the collaboration resulting from technology has been found to be positively related to financial performance (Hwang and Kim 2019). Several new sources of data over the past two decades has resulted in the availability of large amounts of user data in the digitization of commerce (Motiwalla et al. 2019). Big data sources have helped to enhance supplier insight into user behavior. For example, big data have been used to personalize product offerings, prices, and display assortments through the utilization of downstream attributes including geo-localization, demographic information, and the time of day (Cohen 2017; Zhan and Tan 2018). Other benefits involve, customization in electronic retailing (Thirumalai and Sinha 2009), the use of metadata for enterprise document searches (Schymik et al. 2015) targeted promotions and campaigns which reduce the costs involved in procurement, the use of service robots in the hotel industry impacting assurance and reliability (Chiang and Trimi, 2020), as well as predictive modeling and analytics aiding product and process modeling (Cohen 2017; Duan et al. 2018). These capabilities have prompted service inseparability allowing for downstream entities to actively participate in service production and delivery which can lead to increased customer satisfaction and service productivity (Ahn and Rho 2016).

From a logistics service perspective, various research has incorporated and emphasized the role of technology and analytical support as a core competency providing performance outcomes (Hwang and Kim 2019). Logistics applications include vehicle equipment with on-board diagnostic devices for monitoring driver behaviors in real time (Stankovic 2014; Baesens et al. 2016), which allows for vehicle managing companies and insurance companies to make decisions through predictive modeling. Utilizing a Delphi technique, Kache and Seuring (2017) identify other various opportunities and challenges for big data including information management, operations efficiency and maintenance, supply chain visibility and transparency, product and market strategy, product traceability, supply chain inventory optimization, risk evaluation, etc. all to enhance the consumer service level, trust in the organizations through transparent, ethical practices, and experience.

2.3 Implications of big data

With a focus on service to satisfy downstream members’ needs for financial, strategic and marketing benefits, firms that proactively design service systems to country variability can open up market opportunities and decrease costs incurred with transactions (Lo 2016). However, these improvements can come at a cost. Information privacy has been referred to as one of the most important issues facing individuals since the advent of e-commerce (Smith et al. 1996) and is now becoming more important as digital technologies give rise to big data applications. Challenges including a lack of governance to control big data efforts across supply chain networks have led to organizational reluctance from many (Kache and Seuring 2017). More prevalent in recent literature is cybersecurity issues. Rapid growth of concerns about revealing proprietary information has, in some cases, led to a reluctance by retailers to share information across the supply chain despite the advantages big data bring to the networked economy (Menon and Sarkar 2016).

Understandably both positive and negative aspects to big data applications exist. However, literature suggests that it is likely no area of business activity will remain unaltered by this paradigm (Kache and Seuring 2017). Ways in which corporations can project and communicate with stakeholders on big data to reduce concerns and optimize sentiment is still elusive in literature. Recent literature focuses on applications of big data including social media use in supply chain setups (Chae 2015), allocation of resource decisions (Saboo et al. 2016), and early warning systems or anomaly detection (Breuker et al. 2016). Other literature discusses the implications of big data. For example, articles dating back several years ago discuss leveraging analytics to optimize a company’s supply chain network (Davenport 2006). Trkman et al. (2010) and De Oliveira et al. (2012) examine analytics on supply chain performance. Hazen et al. (2014) look at the impact of data complexity on the importance of monitoring and controlling for data quality. Other implication studies focus on optimizing freight in the transport supply chain (Harris et al. 2015), value portfolio optimization in manufacturing (Opresnik and Taisch 2015), supply chain innovation competencies (Tan et al. 2015), and logistics management in the automotive industry (Zhong et al. 2015).

While the research on applications and implications of big data provide various avenues for progression in this domain, there is little insight regarding stakeholder sentiment of big data in supply chain management. Data leakage both within and outside a supply network are sometimes unforeseeable. Corporate transparency has been shown to impact investor sentiment (Firth et al. 2015). Additionally, media publicity has been shown to have an impact on individual perceptions on the importance of corporate social responsibility behavior (Elias 2004). This study explores if CM can be utilized as tools to enhance stakeholder sentiment for promoting big data applications in supply chain management.

2.4 Agenda-setting theory

Before technology adoption takes place, individuals need to first understand the function as well as develop an attitude toward the technology (Sun 2018). Communication through channels is fundamental for technology perception which will lead to either the decision to adopt or not to adopt. This paper deals with CM as one fundamental channel among members of a social system. Agenda setting was popularized in the early 1970s when McCombs and Shaw (1972) found that issue “salience” (defined as the amount of attention granted to a particular issue) in media is strongly associated with stakeholder perceptual importance of specific topics. These findings reiterated previous scholarly undertones that media might not necessarily tell stakeholders how to think about a topic but rather what topics to think about. That is, the basic premise of agenda setting is that media transfer the salience of topics from the media agenda to the public agenda. Media lead the public to assign importance to public issues (Zhu and Blood 1997), but does not accomplish this by telling the public about the importance of one issue over another. Instead media give certain issues more prominence and preferential treatment through frequency of coverage and prominent positions.

The connections between technology adoption and agenda setting can be situation specific. Both have been tied to organizational behavior theory. Specifically, previous research in technology adoption identified five determinants of rate of adoption: relative advantage, compatibility, complexity, trialability and observability (Rogers 2003). Agenda setting encompasses concepts from individual perception of “obtrusiveness” (i.e. personal experience with an issue), “need for orientation” (relating to a person attending to an agenda) and “salience” (Chernov et al. 2011). Both media and technology can generate individual perceptions as they relate to subjects and objects. The ties between agenda setting and technology adoption theory can be expanded upon utilizing psychological concepts thus building value for organizational communication and subsequent adoption and use of technology. Chew et al. (2012) discuss “familiarity bias”, in which individuals tend to focus on adverse scenarios in evaluating any defection from the status quo. The fear that propagates from the unfamiliar tends to impact how we react during times of decision including our basic stances in regard to CM. At times the primary emotions can manifest into secondary emotions, including anger from powerlessness in the face of uncertainty (Seltzer and Mahmoudi 2013). Individuals tend to compensate for this by utilizing their own communication mechanisms to rebuild self-esteem that is hindered due to the primary emotions perpetuated by the unknown. Related to the concept of “complexity” is agenda setting’s “obtrusiveness. The difference between the two concepts lies in the experience of the individual. Complexity is purely based upon perception regardless of experience vs. obtrusiveness which incorporates actual experience. Research as it relates to complexity clearly demonstrates the negative impact on technology adoption (Sun et al. 2018). However, mixed results appear regarding the impact of obtrusiveness on agenda setting effects. Some research has found that media agenda setting effects are stronger for unobtrusive issues since audiences rely on the media for information on these issues versus other sources for those issues that are obtrusive (Zucker 1978). Alternative literature reports otherwise. For example, Yagade and Dozier (1990) found that the media agenda setting is enhanced by “concreteness” and reduced by “abstraction”. While the connection between obtrusiveness and agenda setting has been and is currently being discussed, limited knowledge exists on the reason behind these findings and specifically understanding the psychological confounding factors that may play into the findings.

3 Hypotheses and research model

Technology adoption is defined as the process where a technology is communicated through channels among members of a social system (Rogers 2003). Before adoption is to take place, individuals need to first understand the function and develop an attitude toward the technology (Sun 2018). Communication through channels is fundamental for technology perception, which will lead to either the decision to adopt or not to adopt. “Salience” in media is strongly associated with stakeholder perceptual importance of specific topics relevant to technology. These findings reiterated previous scholarly undertones that media might not necessarily tell stakeholders how to think about a topic but rather what topics to think about. Media transfers the salience of topics from the media agenda to the public agenda. This leads the public to assign importance to public issues (Zhu and Blood 1997), but does not accomplish this by telling the public about the importance of one issue over another. Instead, media give certain issues more prominence and preferential treatment through the frequency of coverage and prominent positions.

Individuals develop opinions, beliefs, and views on technology utility based on messages received externally. Considered as one of the most consistently critical predictors of information technology adoption (Sun et al. 2018), the relative advantage is a perceptual measure that captures an individual’s perceptions of the benefit in terms of value and competitiveness of the technology. This can be with or without experience but must be based upon a conceptualization of the technology in comparison with its predecessors.

Corporate communications in the media can enhance the salience of relative advantage, thereby impacting stakeholder sentiment brought about by corporate communication on a topic. Cumulative stakeholder sentiment can thus be impacted and is defined here as the mean sentiment score of all stakeholders responding to a CM document. As such, the salience of a technology within a CM document is purported to impact sentiment as outlined in hypothesis 1.

H1. Salience of “big data” topic within the CM document will have a significant positive association with cumulative stakeholder sentiment.

Content of CM can be either topic, or the syntax utilized. Specifically, sentiment can be relayed throughout the article that indicates either neutral, positive, or negative opinions on a topic. Previous research has assessed the effect of social and conventional media sentiment on firm performance. Specifically, Yu et al. (2013), identify significant positive impacts of the blog and Twitter sentiments on firm performance. Ecommerce research also suggests that the way information is presented impacts e-customer satisfaction (Periera et al. 2017). Indicative of a shareholder sentiment, the positive impacts of media sentiment indicate document sentiment (i.e. associated sentiment of the CM document) will likely impact stakeholder sentiment. Clapman et al. (2019) indicates the sentiment of messages and news which are highly popular can achieve higher investor attention. As such, we hypothesize:

H2. Document sentiment will have a significant positive association with cumulative stakeholder sentiment.

Yu et al. (2013) indicates that while social media have a stronger impact on financial performance than conventional media, both will have a joint effect on the market. That is, the more discussed a particular topic is, the more likely the influence will be more significant. This indicates that the volume of influence is more with salience than with sentiment (Yu et al. 2013). Additionally, Waters (2013) found that when organizations were able to place spokespeople in the newscast to talk about efforts, the donations increased substantially. This alludes to the implication that when corporations make themselves more salient in media this can help to enhance their performance through agenda setting. Bridging this with technology adoption theory, to make a technology topic or organization more concrete, it suggests a potential impact on the stakeholder’s sentiment towards that technology.

Of course, big data are far different from humanitarian relief donation efforts. Intuitively, stakeholder attitudes will likely differ between perceptions of relief efforts over corporate efforts to enhance perceptions of relative advantage with big data. Yet seeking an understanding of what impacts stakeholder sentiment, this study relies on the empirical findings of Waters (2013) as well as the basic tenets of technology adoption theory and agenda-setting to formulate the following:

H3. The salience of “big data” topic within the CM document will have a stronger positive association with cumulative stakeholder sentiment than the document sentiment.

In the age of digital media, where individuals are being overrun by vast amounts of information from organizations, current studies suggest high levels of skepticism (Elving 2013) despite the service benefits. While salience may enhance the likelihood of a stakeholder thinking about a specific topic, the sentiment toward that topic is likely to be controlled by the stakeholder. Ultimately, this drives the question of how stakeholder psychology may impact the sentiment of big data through corporate communications in media.

Chew et al. (2012) discuss “familiarity bias”, in which individuals tend to focus on adverse scenarios in evaluating any defection from the status quo. The fear that propagates from the unfamiliar tends to impact how we react during times of decision including our basic stances in regard to CM. At times the primary emotions can manifest into secondary emotions, including anger from powerlessness in the face of uncertainty (Seltzer and Mahmoudi 2013). Individuals tend to compensate for this by utilizing their own communication mechanisms to rebuild self-esteem that is hindered due to the primary emotions perpetuated by the unknown.

But how do organizations control for the psychological bias against the unknown? An emergence from agenda setting is the topic of “framing” in which a topic can be framed in such a way that issue attributes are emphasized in order to control for the sentiment (Neuman et al. 2014). In basic terms, framing is the selection of a restricted number of related attributions when a particular topic is discussed (McCombs et al. 2014). A topic such as big data in service-oriented supply chains needs to be framed in such a way that it reduces the unknown and resulting fear and anger associated with the topic due to a general lack of understanding of its implications. Utilization of concepts from technology adoption theory including the clarifying stage of diffusion (i.e., Miles 2012) involves helping individuals understand the complexity through correcting any misunderstanding and clarifying the purpose, meaning and function of the technology. Specifically, instead of focusing on potential avenues for the technology and how the organization can fit the purpose of the technology (i.e. a focus on potential Future Applications), this stage seeks to suggest “current applications” for how a technology can fit the needs of the organization (Miles 2012). Based on this reasoning, this study hypothesizes the following.

H4. Discussion of “Future Applications of Big Data” in a CM document will be significantly negatively associated with cumulative stakeholder sentiment.

H5. Discussion of “Current Applications of Big Data” in a CM document will be significantly positively associated with cumulative stakeholder sentiment.

4 Research methodology

Researchers have typically used surveys and experiments to understand the role of CM in influencing stakeholder reactions to communication dynamics. Unfortunately, these often confront distortion from self-reported data (Neuman et al. 2014). In an effort to obtain current mainstream data and to gather from a variety of sources, this study uses news sources (i.e., practitioner magazines including Supply Chain Quarterly, Supply Chain Management Review, DC Velocity, Inbound Logistics, Logistics Management, Supply Chain Dive, Modern Materials Handling) as well as video sources (i.e., YouTube and Google Videos including company respondent interviews, company discussions on big data applications, panel discussions, and company representative speeches).

4.1 CM document collection and data cleansing

This paper refers to the practitioner and video media as “CM documents”, specifically referencing corporate communications through media on two platforms (i.e., practitioner publications and video media). While not all media are reflective of “corporate communications” in this study, any CM document that did not have a company spokesperson interviewed or quoted was removed.

Data acquisition and subsequent cleansing of data occurred over a four-month period from January 2018 through April 2018. Data were collected and cleansed by one professor and two graduate student assistants whose expertise lies in social media and supply chain management.

The research question in this paper addresses whether organizations can enhance the salience of big data in supply chain management through discussions in CM documents and what organizations can discuss in these documents to enhance stakeholders’ sentiment on big data in supply chain management. This question required obtaining CM from differing sources. Practitioner magazines as well as video sources (i.e., YouTube and Google Video) were utilized to obtain the CM documents. Article textual data were initially collected through a cloud web scraper through Google Chrome. This included accessing the practitioner magazine websites and signing in through a personal or university subscription and subsequently running the web scraper code to collect data including: “title”, “date”, “author”, “affiliation”, “main text”, and “abstract text”. A different code was used for each publication based on the layout.

Video data were manually collected by a co-author and two hired graduate assistants. This involved manually collecting “title”, “date”, “author”, “affiliation”, “text” and “summary” data when available through the YouTube and Google Video search. Textual data were collected through closed captioning of the data provided via YouTube or Google Video. This involved collecting closed captions from the website as well as having two graduate assistants review and, when necessary, adjust the closed captioning to ensure accuracy. Some of the videos obtained required graduate assistants to re-write the closed captions due to a lack of accuracy.

Table 1 presents the search string as well as the specific periodicals referenced and exclusion criteria. The search string was created from our research question based on current research. Our definition of “big data” can be operationalized in various ways and is a direct result of digital technologies. Thus, instead of focusing on CM that just reference “big data”, this study utilizes a broader term to account for the technologies that produce “big data”.

Table 1 Textual sources and exclusion criteria

Following the work of Durach et al. (2015), the search string was initially developed by the co-authors of the paper and then revised by a subject librarian at one of the co-author(s) institutions who had knowledge of digital technologies and was able to adjust based on the needs of the database. An initial period of January 1 2011 to January 1 2018 was chosen based on the emergence of the term “big data” (Gartner 2011). However, since the authors were also collecting social network post data referencing the CM documents, a period of two years from the social network collection, was utilized to help control for relevancy of the topic as well as current CM document coverage. That is, CM documents from the previous year were utilized for our final analysis. Following this, social network data referencing the specific CM documents were collected. Relevancy was ensured through a title search as well as an overview of the summary or abstract textual data. After the initial collection, graduate assistants manually went through each title and summary textual data to ensure its relevance to the research question. Where summary or abstract textual data were not made available, two co-authors read through the textual data to ensure its relevance to the research question.

Through this process, articles were coded based on the technology topic discussed. This was done through a search of key terms both in the title, summary and main text for terms like “big data”, “big-data”, “data analytics”, “data analysis (and process*ing)”, “data processing”, “data mining”, “big analytics”, “data science”, “extreme info*rmation”. These keywords were developed based on an initial literature search referencing academic articles and news articles. Other categorizations of technology-specific terminology to digital technologies were also coded but not used in this analysis. The initial results as well as the frequency of results from the use of our exclusion criteria are depicted in Table 1. These numbers reflect the initial search conducted in January 2018.

4.2 Social network collection and data cleansing

Social network data collection occurred during January 2018 through March 2018. As mentioned previously, to control for the time effect on sentiment, we utilized social network data only during January 2018. Similarly, we only used CM that happened just prior (i.e. January 2016 through December 2017). Controlling for the number of days since the publication versus the post helped to control for the impact that time has on big data application sentiment.

RapidMiner, a machine learning, text mining, and predictive analytics software, was used to collect social network data. Utilizing the Twitter connector, Twitter data were directly collected through RapidMiner Studio. Phrases, tweets, and user profile information were collected through an authentication mechanism (OAuth 2.0). OAuth 2.0 enables applications to obtain access to user accounts through an HTTP service. This is done through a streaming API that fulfills both the resource and authorization server roles. To receive more complete coverage of social network data obtained from each CM document, several queries were utilized per CM document. This included queries pertaining to the “title”, “author”, “affiliation” and when necessary elements from the “abstract text” or “summary” were utilized to gather social network data. After the initial collection of the social network data, the co-authors manually went through and deleted social network posts that did not specifically reference the article. Additionally, social network posts that included URLs not related to the CM document were removed. The deletions were later reviewed by graduate assistants to ensure accuracy.

In totality, sixty-seven CM documents (as defined by the inclusion of a firm representative) had 1408 social network posts. Data cleansing involved the elimination of non-English social network posts, non-letter characters, URLs, mentions and re-tweet identifiers (Pournarakis et al. 2017). Overall a total of 508 social network posts were utilized for further analysis. Following this demographic data were also collected from each of the social network stakeholders. This included the utilization of RapidMiner’s “Get Twitter User Details” operator through a Twitter connection to specify the Twitter account for the API access. Additionally, other demographic characteristics were captured manually through a social network account by utilizing a username and/or ID searches. Initial characteristics included: Twitter Username, ID number, screen name, description, URL, location, followers, friend, favorites, tweet, language, time zone, and profile image. All twitter data and CM document data were stored in multiple Excel Spreadsheets.

4.3 Sentiment analysis

Sentiment analysis has been used to effectively determine customer responses by extracting sentiment in text data deriving valuable information (Kim and Hong 2020). Data preparation for subsequent sentiment analysis for both the CM documents and the social network posts was based on the work of Sul et al. (2017), who developed a model that combines topic and sentiment classification to elicit influential subjects from consumer perceptions in social media. Additionally, this study references Pournarakis et al. (2017) who empirically apply a model to conduct an analysis of over 280,000 tweets related to a specific topic (i.e., Uber transportation network) with the goal of providing insights into awareness and meaning in brands. The authors introduce a model that evokes subjects from consumer perceptions in social media. As the purpose of this study examines a similar context (stakeholder perceptions of big data), without creating topic classifications, this study utilizes the data preparation and subsequent sentiment classification steps that Pournarakis et al. (2017) define.

The first step involved data cleansing and pre-processing. This step aims to prepare data for various data mining tasks. First, text tokenization (i.e. text segmentation and removal and stop words), and stemming (i.e. obtaining root forms of derived words) (Pournarakis et al. 2017; Swain and Cao 2019) were applied. Both the raw CM documents and the social network data are separated into databases and the final version of each was formed as a set of purified documents (i.e. involving the deletion of blanks, duplicated records and stored as a refined corpus in a computable format) (Swain and Cao 2017). Each document (i.e. represented as either a complete CM document or a social network response) is represented through a collection of words. After the first stage of data clearing pre-processing, sixty-seven CM documents were used for the classification stage as well as 508 social network posts. This was conducted through a lexicon (WordNet-SentiWordNet) (Miller 1995). This is a sentiment lexicon that is derived from WordNet where words are clustered into groups of synsets. Each CM document and social network post were represented by a collection of words and then utilized for sentiment classification.

The second step involved sentiment classification through the utilization of support vector machines (SVM) which are non-linear classifiers operating in higher dimensional vector spaces than the original feature space of a given data set. SVM’s training process involves a quadratic minimization problem in the context of a binary classification. Through the utilization of the Gaussian kernel function, performance evaluation of the SVM classifier was measured through a tenfold cross-validation process on an equally balanced set of 50 social network posts and 20 documents. Each fold split the pre-labeled samples into a 95% training-data/5% testing data ratio. The study utilized this ratio based on the size of the data set. This split allows for greater accuracy and reliability (Pournarakis et al. 2017). The first subset of data instances was utilized to build a classifier and the latter was used to assess its ability to suggest sentiment polarity. Mean accuracy per fold was 0.7092 (± 0.179), mean recall per fold was 0.7083 (± 0.194).

The sentiment of the rest of the data were conducted through an exploitation of the complete set of pre-labeled data instances. Each CM Document (Document Sentiment), and the social network posts or (Individual Sentiment), \({Y}_{d} ,\) was given a soft decision value \(\in \{-1\le d \le 1\}\) and \(\in \{-1\le {Y}_{d} \le 1\}\) where

$$\left\{\begin{array}{c}d=1, for\, positive \,sentiment \,classification.\\ d=-1, for \,negative\, sentiment classification.\end{array}\right.$$

4.4 Variables

Since analysis occurred at the CM Document level, and one document would have numerous social network posts, further data preparation for “Stakeholder Sentiment” or \({Y}_{d}\) was necessary. Initial values of “STAKEHOLDER_SENTIMENT”, \({Y}_{d}\), were computed by taking the average sentiment score for each social network post (or document)\(.\) If a stakeholder posted more than once about a CM document, the average was used as that stakeholder’s cumulative score. No stakeholder repeats were noted for different CM documents, so this was not factored into the research model. In the case of repeated social network posts by the same stakeholder per one CM document,\({i}_{\mathrm{1,1}}\) represents stakeholder 1’s first social network post for \({sd}_{n}\) and \({i}_{\mathrm{1,2}}\) represents stakeholder 1’s second social network post for \({sd}_{n}\) and so on. This computation of \({\gamma }_{d}\) was only utilized for three of the CM documents in which there was a stakeholder that posted more than once. Table 2 presents a summary of our variables utilized in this study. The independent variable, “Big Data”, was represented as \({x}_{1d} \in \left\{\mathrm{0,1}\right\}\) for every CM document\(d\). The terms utilized for a classification of {1} included: “big data”, “big-data”, “data analytics”, “data analysis (and process*ing)”, “data processing”, “data mining”, “big analytics”, “data science”, “extreme info*rmation”.

Table 2 Summary of variables and definitions

In order to determine the strength in relation to “Big Data”, “Document Sentiment” is another variable that we input into the model, which is represented as

$$x_{2d} \in \left\{ { - 1 \le x_{2d} \le 1} \right\}$$
(1)

where x2d is the associated sentiment of the CM document \(d\). While decision values close to {0} can be indicative of a neutral sentiment, this is not explicitly presented in the SVM classifier during training. The grouping variable, “Current Applications”, is represented as \({x}_{3d} \in \left\{\mathrm{0,1}\right\}\) for every Big Data CM document. The other grouping variable, “Future Applications”, is represented as \({x}_{4d} \in \left\{\mathrm{0,1}\right\}\) for every Big Data CM document. Variables \({x}_{1d}, {x}_{3d}, {x}_{4d}\) were coded by two independent coders (one co-author and a graduate student assistant). All dummy-coded variables were mean-centered for interpretation in the interaction model. Several control variables were utilized based on previous agenda-setting literature and adjusted to fit the document level analysis in this study. Specifically, agenda-setting purports that the type of media used can impact stakeholder perception and attitude (Singer 2018). Since this study utilizes both video and print media, dummy variables were created for the categorical variable of “Type of Media” (\({\complement }_{1dj})\) with video and print documents. Another variable this study controls for is location, specifically the country of the stakeholder. Demographic data were collected from the stakeholders in seven different country locations, all of which differed in their economic development and national conflicts. Thus, these analyses called for a dummy coding of each country into a variable labeled “Location” (\({\complement }_{2dj})\) and regression onto the dependent variable.

This study also controlled for experience within operations, supply chain, or information systems fields. This was done by assessing demographic data collected by a public API. In some cases, the career title was not made clear based on the initial data collection. In this case, we conducted an external search through primarily LinkedIn but also the affiliated social network. The timing of the social network post was accounted for by assessing the time of the social network post with the time of the job provided on the social network platform. Dummy codes were created for “Experience” if the stakeholder indicated a career in Operations, Supply Chain or Information Systems fields. Approximately 1.2% of stakeholders did not have any employment information accessible to the researchers. In this situation, the data point was removed. The ratio of experience to no-experience was then utilized as the cumulative score for “Experience” (\({\complement }_{3dj})\) for the document, \(d.\) Further, this study controlled for coverage. To control for this and timing effects, this study assessed the popularity of the digital technology topic covered in that month. Specifically, we counted the number of big data CM documents η occurring in month i of the social network post. For each CM document, the average of the total number of social network posts was used as the variable “Amount of Coverage” (\({\complement }_{4dj})\).

This study also controlled for the number of friends a social network poster has. Agenda setting is particularly popular among stakeholders with a large number of “followers” or “friends”. A continuous variable, “Friends” (\({\complement }_{5dj})\), was created and defined as the average number of friends utilized for the cumulative score for the document, \(d.\)

Finally, “Specificity” (\({\complement }_{6dj})\) of the CM document on a particular technology was controlled for. This included utilizing a keyword search of the CM document based on synonyms of digital technologies utilized in the initial literature review. Technologies, including cloud computing, sensors, autonomous vehicles, 3D-printing, etc., were counted and reflected in a continuous variable indicating the specificity of the topic to the article.

4.5 Bayesian regression

To assess the significance of the hypotheses presented in the research model, this study employs Bayesian regression with Gibbs sampling. The Bayesian regression for the hypotheses testing helps handle more complex models and facilitates decision making when additional information is gathered. Additionally, Bayesian analyses can be useful in smaller samples (Hahn et al. 2020). Utilizing the Bayesian regression, we form a likelihood function based on an asymmetric Laplace distribution. Given the uniqueness of the dependent variable’s distribution, we are able to model within the specified distribution and estimate a set of percentiles indicating the direction of the effect.

This study first examines the direct relationship between “Document Sentiment” and “Big Data” on “Stakeholder Sentiment” after controlling for six variables as outlined in Table 4. That is, this study tests the following equation:

$$Y_{{dj}} = \alpha + \beta _{0} X_{{1dj}} + \beta _{1} X_{{2dj}} + \beta _{2} X_{{1dj}} *F_{{1dj}} + \beta _{3} X_{{1dj}} *A_{{1dj}} + \gamma C_{{{\text{dj}}}} + _{{dj}}$$
(2)

where \({Y}_{dj}\) refers to the cumulative stakeholder sentiment for document \(d\) at level \(j.\) The variable, \({X}_{1dj}\) refers to the dummy coded “BIG_DATA” variable for document \(d\), \({X}_{2dj}\) refers to the sentiment of document \(d\). The variable \({F}_{1dj}\) refers to the future applications for document \(d\) at level \(j\) and \({A}_{1dj}\) refers to the future applications for document \(d\) at level \(j.\) \({\complement }_{dj}\) represents the six control variables for the document, \(d.\) Tables 3 and 4 shows the summary statistics as well as the correlation table.

Table 3 Summary statistics
Table 4 Correlations

Table 5 shows the posterior summaries of the estimated parameters \({\beta }_{0}, {\beta }_{1}\) and \(\gamma .\) The sample mean of the output chain for the intercept \({\beta }_{2}\) is 0.0514 with the 95% credible not containing 0 and instead of being positive, indicating a high probability that the direction of the effect of Big Data are positive. This indicates support for hypothesis 1. The results do not support hypothesis 2, as the coefficient of the variable CM Document Sentiment is 0.0190 with the 95% credible interval containing 0, indicating the direction of the effect of Document Sentiment is ambiguous. The analysis conducted in this study did find more support for the association between the inclusion of a Big Data topic in the document versus the Document Sentiment. However, since hypothesis 2 was not found to be significant the analysis does not provide further support for hypothesis 3.

Table 5 Posterior summaries and intervals

Bayesian analyses were conducted to determine the impacts of both Future Application and Current Application communications in Big Data documents. Results indicated support for hypothesis 4, where the coefficient variable of Big Data with Future Applications is −0.0835 (−0.1422, −0.0128 95% credible interval) indicating the direction of the effect is negative. Results do not show support for hypothesis 5 with the coefficient variable of Big Data with Current Applications is 0.000886 (−0.2674, 0.2225 95% credible interval) indicating the direction of the effect is ambiguous.

5 Conclusion

As supply chains move from product-orientation to service-orientation, there will inevitably be a greater need for organizations to understand not only asset utilization, but also the management of knowledge, talent and resources. This requires a network perspective focused on seamless integration amongst various entities and stakeholders. Previous research particularly focuses on the importance of the downstream entities in service-oriented firms (Ahn and Rho 2016). Customers drive service networks to adopt both diverse and flexible process designs additionally internal members are empowered to consistently deal with changing needs in service production and delivery processes. Yet, service-oriented supply chains require seamless integration amongst all members in a network. Big data applications allow for ease of data transfer but the sentiment toward data transfer can vary amongst entities leading to complexities in use from a network perspective.

Very minimal work has been done on stakeholder sentiment of big data use in service-oriented supply chains. Methodologically, this paper employs both textual analysis techniques while also controlling for common problems present in textual analysis by utilizing stakeholders to verify sentiment analyses of textual data. Further, previous studies utilizing social media to examine cumulative sentiment from media typically only utilize keyword searches to assess both news media and social network posts and thus do not control for whether the stakeholders have read the media. This study first conducted a CM document search and then a social network web scrape via Google to collect social network posts that reference the CM document found through the CM document search, thus establishing a causal inference while controlling for time.

Secondly, from a theoretical standpoint, this study contributes by examining what an organization can control to impact stakeholder sentiment, thereby eliciting insight on how corporate communications through media can alter stakeholder sentiment. The findings support the role of the salience of a topic having a more significant impact on stakeholder sentiment than CM Document Sentiment. Specifically, this study finds the use of CM Documents discussing Big Data in service-oriented supply chains is more positively associated with stakeholder sentiment than the actual sentiment of the CM Document. This provides support that what is discussed in CM documents is more important in influencing Stakeholder Sentiment, than how it is discussed. Further, this finding suggests a positive outlook of big data in the supply chain.

Third, we find the discussion of certain subtopics can elicit stakeholder sentiment through a focus on familiarity bias in technology adoption. Specifically, the sub-topic of Future Applications is negatively associated with Stakeholder Sentiment. This elicits insights regarding stakeholder psychological bias toward proven applications of big data in supply chain management versus potential Future Applications of big data in supply chain management. Possible reasons for this might be derived from a technology adoption theory perspective where immediate reactions to Future Application discussions may trigger stakeholder skepticism thereby reducing sentiment of stakeholders. While the emotional element was not studied in this analysis, providing proof of applications is vital when eliciting enhanced stakeholder sentiment.

The use of big data in supply chain management is progressively impacting how supply chains store, disseminate, and utilize data to enhance knowledge to value creation both internally and to the consumer. In today’s environment, organizations are becoming more accommodating in their business operations in order to deliver services according to downstream preferences (Ahn and Rho 2016). Big data applications allow for the quick dissemination of information throughout the network to reduce variability required for greater service performance. However, stakeholder sentiment of big data vary due in part to perceptions relating to its innate connection with privacy risk and vulnerability (Lowry et al. 2017). The use of big data is, ultimately, required for service-oriented supply chains to keep pace with the rapid enhancements caused by digital technologies. The use of corporate communications through media have been utilized to help inform stakeholders of the benefits and to calm the fear imposed by this transition in supply chain management. Nevertheless, understanding how corporate communications can be utilized to reduce the concerns while also enhancing stakeholder sentiment is still in its infancy.

There are, however, several opportunities for expanding on this research that will not only elucidate the various complexities of big data perception from a supply chain perspective, but also enhance organizational knowledge of how corporate communications can enhance the perspectives of stakeholders on big data. For example, this research focuses on one-way communication from the organization to the stakeholder. In the age of digital media, communications are happening rapidly between organizations and stakeholders as well as stakeholder to stakeholder. Previous research has even found a potential link between community concerns about environmental issues and organizational responses to disclosures of environmental information (Brown and Deegan 1998). Future research may entail enhancing insight of stakeholder sentiment on corporate communications through a time-series analysis of social network data. Additionally, technology maturity and regulation can have various effects on manufacturing companies servitization paths (Finne et al. 2013). A time series analysis and larger sample of the various stakeholders of an organization can provide an understanding of how big data analytics can impact service-oriented procedures and perceptions of those procedures from an employee perspective.

Future research might also address topics discussed on big data in more depth. Specifically, while future applications and current applications were assessed in this study, other topics might also be relevant. Some topics might include feelings expressed, subjectivity of the text, and/or actual applications of big data analyses. Although the latter would require a substantial amount of data, the results would provide further insights into ways in which big data application might yield more positive sentiment.

Aside from social media data, comments from the CM might also be another method of analyzing the impact of CM on reader sentiment. This can also be used to assess the results in this paper with another set of data. However, the user data would need to contain demographics of the individual poster. This is sometimes provided by the website, but often times difficult to find. Future research is encouraged to collect a larger sample and utilize differing forms of user/reader data.

One limitation of this paper was the determination of the cumulative sentiment of response to CM document. An average was utilized in order to determine an average cumulative sentiment of response to CM document. The use of an averaging approach can be problematic when data do not follow a normal distribution. Future research may find other methods to determine the cumulative sentiment of response to CM document including but not limited to a quantile approach.

Another limitation of this research is a focus on stakeholder reaction through social networks. It is problematic to equate online comments to “public opinion” as users are not necessarily demographically representative of local cultures (Neuman et al. 2014). Of course, this is not adjusted for only by the use of in-person, telephone, and online surveys. Collecting a larger, more representative sample of stakeholders through a multi-method analysis may help aid in reducing this bias in future research.