Enhancing the government accounting information systems using social media information: An application of text mining and machine learning

https://doi.org/10.1016/j.accinf.2022.100600Get rights and content

Highlights

  • Exogenous variables integrated into service processes can be used within modern accounting and assurance operational services.

  • This study brings an innovative data source, social media information, to government accounting information systems as part of the service evaluation and assessment factor.

  • Social media information is analyzed using text mining techniques and machine learning algorithms.

  • The study presents an analytical approach to classify the tweets and uses VADER to derive public opinion about street cleanliness.

Abstract

This study demonstrates a way of bringing an innovative data source, social media information, to the government accounting information systems to support accountability to stakeholders and managerial decision-making. Future accounting and auditing processes will heavily rely on multiple forms of exogenous data. As an example of the techniques that could be used to generate this needed information, the study applies text mining techniques and machine learning algorithms to Twitter data. The information is developed as an alternative performance measure for NYC street cleanliness. It utilizes Naïve Bayes, Random Forest, and XGBoost to classify the tweets, illustrates how to use the sampling method to solve the imbalanced class distribution issue, and uses VADER sentiment to derive the public opinion about street cleanliness. This study also extends the research to another social media platform, Facebook, and finds that the incremental value is different between the two social media platforms. This data can then be linked to government accounting information systems to evaluate costs and provide a better understanding of the efficiency and effectiveness of operations.

Introduction

Future accounting systems will utilize large amounts of exogenous data (Brown-Liburd et al., 2019) in conjunction with traditional accounting data. Government accounting systems will move to be a conglomerate of three main components: 1) traditional financial, 2) infrastructure maintenance, and 3) quality of services (Bora et al., 2021). This study illustrates how exogenous variables eventually integrated into service processes can be used within modern accounting and assurance operational services. It explores an alternative performance measure by analyzing social media information to enhance government managerial decision-making and bring innovation to governmental operations. The progressive development of information and communication technologies (ICTs) and the digital transformation of operations have fundamentally changed every aspect of people’s lives, social needs, as well as communication strategies with the government. Modern government reporting demands reform toward a “data-driven, analytics-based, real-time, and proactive reporting paradigm” (Bora et al., 2021). A dynamic and interconnected communication channel with the citizens would generate the exogenous data source to improve public services’ performance and delivery. It would also be part of the three-dimensional reporting system measuring and reporting the quality of services. Outdated measurements and old-fashioned ways of operations cannot provide efficient public services to meet current citizens’ needs and expectations. For example, the New York City (NYC) Mayor’s Office of Operations implements a Scorecard inspection program to assess the cleanliness of its streets and sidewalks by relying on inspectors’ subjective judgment during a drive-by visual inspection of sampled locations.1 This method was established in 1973 and has not changed for nearly fifty years (Office of the New York State Comptroller, 2020). The ratings are adjusted for street miles but not for the population, housing density, or the nature of activity in the inspected area, such as residential or commercial areas. Based on the current rating, the majority of the streets are rated as acceptably clean (See Appendix A). However, the Office of New York State Comptroller issued an audit report in 2020 where it stated several weaknesses of the methodology used by the Mayor’s office, specifically the inspection process and the rating calculation, which raise concerns over the reliability of the ratings.

The auditors also pointed out that “without analyzing and acting on all available data, including complaints, to identify and mitigate the underlying problem, there is material risk that the same sanitation problems will continue to surface and negatively impact the quality of life for residents and visitors in those areas” (Office of the New York State Comptroller, 2020). The state auditors encouraged the Department of Sanitation to consider all the available data sources to develop and implement additional performance measures for street cleanliness (Office of the New York State Comptroller, 2020). The current service reporting system is what technology of the last century could provide. As accounting information systems are rigid and backward-looking, the public would be much better served with close-to-real-time service reporting integrated with a system of public accountability.

Additionally, NYC residents increasingly contact the Department of Sanitation via NYC311 about missed trash pickups, overflowing litter baskets, and other insalubrious conditions. The examination of the NYC311 service request data from May 22, 2014, to May 22, 2019, reveals an increasing trend of complaints or requests for services by NYC residents to the Department of Sanitation and the Department of Health and Mental Hygiene (as shown in Fig. 1).

To better embrace innovation in government, many plans and proposals are being considered and implemented, including big data analytics, smart cities, machine learning, drone usage, etc. Governments are increasingly adopting innovative data sources and data analytics to better support the decision-making process, such as mobile device sensor-based app data, crowdsourcing data, Twitter sentiment, and postings (Kitchin, 2014, O’Leary, 2013, OECD, 2017, Zeemering, 2021). Several cities have been exploring this area, using different management information systems to gather exogenous data and monitor public services and functions. Examples of these include monitoring traffic based on transportation network data, the data analytic center of the Centro De Operacoes Prefeitura Do Rio in Brazil, London’s Dashboard and LoveCleanStreets App, Boston’s infrastructure monitoring system, etc. (Kitchin, 2014, Li et al., 2018, O’Leary, 2019a, O’Leary, 2013). Incorporating big data into government information systems as part of service evaluation and assessment factors improves public services’ effectiveness, which allows the government official to make data-driven decisions, promptly address the issues, and better deploy the resources.

As an example to demonstrate the possibility of using exogenous data in supporting government managerial decision-making, this study proposes an alternative performance measure. This measure uses social media information to assess the street cleanliness in NYC in response to the New York State auditors’ recommendations stated in the 2020 audit report. It utilizes text mining techniques and machine learning algorithms to examine social media information, applies an analytical approach to identify temporal trends and patterns of street cleanliness, provides a different perspective about street cleanliness other than official cleanliness ratings, and assesses the tweets’ sentiment to measure the performance of municipal services. The study finds that the overall sentiment trend over the examined period is negative, inconsistent with the official Scorecard ratings. This study proposes that the government incorporates social media information into municipal performance evaluation and assessment factors. A continuous monitoring dashboard for street cleanliness that integrates various data sources, including social media information, can be built to support public services decision-making.

Public accountability is an essential factor for a sustainable and stable government. Many government institutions demonstrate their accountability by disclosing the tax revenue amount and illustrating how they spend taxpayers’ money efficiently and effectively, as well as how that expenditure benefits citizens’ lives (Callahan and Holzer, 1999). Involving citizens in the process of government fiscal budgeting and decision-making process, particularly in resource allocation and performance measurement, is critical to meeting citizens’ expectations and increasing the government’s accountability (Berner and Smith, 2004, Ebdon and Franklin, 2004, Justice et al., 2006, Robbins et al., 2008, Woolum, 2011). The majority of governments’ performance measures concentrate on information used to make internal management decisions, such as inputs, outputs, staffing patterns, and resource allocations (Ho and Ni, 2005, Woolum, 2011). Incorporating exogenous data, such as social media information, into government accounting information systems is a way of considering citizens’ preference and their views on public issues, which helps government decision-makers to provide better public services that matter to citizens, determine how public services should be managed, measured, and reported.

The contributions of this study mainly focus on three areas. First, this study demonstrates the possibility of incorporating social media information into the government information systems to support decision-making. Collecting and analyzing social media information is a direct and efficient way to obtain timely feedback from citizens and proactively interact with the public. Government accounting information systems can incorporate these measures and link them to cost figures allowing the understanding of the efficiency and effectiveness of operations. Second, this study presents a data analytical approach to enhance decision-making using more real-time type data rather than only historical data provided by accounting systems. Users can retrieve valuable information from the tweets by utilizing text mining techniques and machine learning algorithms and can handle a dataset with an imbalanced class distribution issue. Among the total number of tweets collected, only a small portion of the data is relevant to the subject; thus, the distribution of the dataset is skewed. The sampling methods used in the study can resolve the imbalanced class distribution issue, and the methodology can be generalized to other areas, such as predicting financial fraud and assessing bankruptcy possibilities. Third, this study provides an example of using social media information as an alternative performance measure. It applies emerging technologies and an analytical approach to examine social media information and provides a different perspective from the general public for tackling a public problem.

The remainder of this study is organized as follows: the second section reviews existing literature on the study of social media information. The third section provides the methodology of this study. The fourth section shows the results, and the fifth section focuses on extending the analysis to another social media platform. Finally, the last section discusses the conclusions and limitations of the study and provides future avenues for research.

Section snippets

Literature review

Research on social media has exponentially grown in recent years. As part of the exogenous data, the added value and the impact of social media are significant considering the volume, velocity, variety, and veracity of the information that is available (Buhl et al., 2013, Vasarhelyi et al., 2015, Yoon et al., 2015, Zhang et al., 2015). A Twitter platform facilitates network interconnections and perfectly illustrates the social network theory. The interconnected network among users generates a

Methodology

The general workflow for this study is illustrated in Appendix B. The following subsections describe each step in detail.

Results

The approach for obtaining results can be divided into two steps. The first step is relevancy determination, which uses a supervised machine learning method to retrieve relevant tweets related to this study. The second step is sentiment analysis, which applies VADER to the relevant tweets identified during the first step.

Framework extension

The tweets were collected based on NYC’s longitude and latitude, not at a granular level (e.g., at street level), due to the limitation of the Twitter API used. To complement this limitation and evaluate the approach to analyzing social media information, another social media platform (Facebook) is selected for testing. Another purpose of this extension is to explore the potential usage of Facebook data in evaluating NYC street cleanliness. Due to Facebook’s privacy restriction on personal

Summary

This study demonstrates how to bring an innovative data source to the government information system and utilize social media information to support government managerial decision-making. Text mining techniques and machine learning algorithms analyze social media information. These social media data sources can develop an alternative performance measure for NYC street cleanliness. Specifically, this paper applies text mining techniques and supervised machine learning algorithms to analyze

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We are thankful for the helpful comments received from Daniel O’Leary, Helen Brown-Liburd, Aleksandr Kogan, Deniz Appelbaum, Lawrence Gordon, and everyone from Rutgers, The State University of New Jersey – Continuous Auditing & Reporting Lab (CAR Lab). Special thanks to the editors and two anonymous reviewers from the journal; thank you for your valuable comments on the publication of this paper.

This paper was presented at the 2019 American Accounting Association (AAA) Annual Research Workshop

References (100)

  • D.E. O’Leary

    On the relationship between number of votes and sentiment in crowdsourcing ideas and comments for innovation: a case study of Canada’s digital compass

    Decis. Support Syst.

    (2016)
  • J.L. Reck

    The usefulness of financial and nonfinancial performance information in resource allocation decisions

    J. Account. Public Policy

    (2001)
  • S.A. Reed

    The impact of nonmonetary performance measures upon budgetary decision making in the public sector

    J. Account. Public Policy

    (1986)
  • R.P. Schumaker et al.

    Evaluating sentiment in financial news articles

    Decis. Support Syst.

    (2012)
  • E.S. Zeemering

    Functional Fragmentation in City Hall and Twitter Communication During the COVID-19 Pandemic: evidence from Atlanta, San Francisco, and Washington

    DC. Government Information Quarterly

    (2021)
  • Z. Alom et al.

    Detecting Spam Accounts on Twitter

  • E. Alpaydin

    Introduction to Machine Learning

    (2014)
  • Arschin, D., 2022. Trash is Piling Up on NYC Streets, Lawmakers Say [WWW Document]. FOX 5 New York. URL...
  • Asur, S., Huberman, B.A., 2010. Predicting the Future with Social Media, in: 2010 IEEE/WIC/ACM International Conference...
  • J. Awwalu et al.

    Hybrid N-gram model using Naïve Bayes for classification of political sentiments on Twitter

    Neural Comput. Appl.

    (2019)
  • S. Bazzaz Abkenar et al.

    A hybrid classification method for twitter spam detection based on differential evolution and random forest

    Concurrency Comput.: Pract. Experience

    (2021)
  • M. Berner et al.

    The state of the states: a review of state requirements for citizen participation in the local government budget process

    State Local Govern. Rev.

    (2004)
  • D.M. Blei et al.

    Latent Dirichlet Allocation

    J. Mach. Learn. Res.

    (2003)
  • E. Bonsón et al.

    A set of metrics to assess stakeholder engagement and social legitimacy on a corporate facebook page

    Online Inform. Rev.

    (2013)
  • M. Bonzanini

    Mastering Social Media Mining with Python

    (2016)
  • I. Bora et al.

    The transformation of government accountability and reporting

    J. Emerg. Technol. Account.

    (2021)
  • H. Brown-Liburd et al.

    Measuring with Exogenous Data (MED), and Government Economic Monitoring (GEM)

    J. Emerg. Technol. Account.

    (2019)
  • H.U. Buhl et al.

    Big data

    Bus. Inform. Syst. Eng.

    (2013)
  • J. Burgoon et al.

    Which spoken language markers identify deception in high-stakes settings? Evidence from earnings conference calls

    J. Lang. Social Psychol.

    (2016)
  • S. Burton et al.

    Interactive or reactive? Marketing with Twitter

    J. Consumer Market.

    (2011)
  • K. Callahan et al.

    Results-Oriented Government: Citizen Involvement in Performance Measurement. Performance & Quality Measurement in Government

    (1999)
  • M.P. Cameron et al.

    Can social media predict election results? Evidence from New Zealand

    J. Polit. Market.

    (2016)
  • V. Chakraborty et al.

    A hybrid method for taxonomy creation

    Int. J. Digital Account. Res.

    (2017)
  • Y.-C.-L. Chan

    Performance measurement and adoption of balanced scorecards: a survey of municipal governments in the USA and Canada

    Int. J. Public Sector Manage.

    (2004)
  • K. Coulter et al.

    “Like It Or Not”: consumer responses to word-of-mouth communication in on-line social networks

    Manage. Res. Rev.

    (2012)
  • A. Culotta

    Lightweight methods to estimate influenza rates and alcohol sales volume from twitter messages

    Lang. Resour. Eval.

    (2013)
  • Culotta, A., 2010. Towards Detecting Influenza Epidemics by Analyzing Twitter Messages, in: Proceedings of the First...
  • C. Dhaoui et al.

    Social media sentiment analysis: lexicon versus machine learning

    J. Consum. Market.

    (2017)
  • P. Dutil

    Crowdsourcing as a new instrument in the Government’s Arsenal: explorations and considerations

    Canad. Public Admin.

    (2015)
  • A.C. Dzuranin et al.

    The current state and future direction of IT audit: challenges and opportunities

    J. Inform. Syst.

    (2016)
  • C. Ebdon et al.

    Searching for a role for citizens in the budget process

    Public Budget. Finance

    (2004)
  • Elbagir, S., Yang, J., 2019. Twitter Sentiment Analysis Using Natural Language Toolkit and VADER Sentiment, in:...
  • I.G.A. Erawan

    Implementation of balanced scorecard in Indonesian government institutions: a systematic literature review

    J. Public Admin. Stud.

    (2020)
  • F. Farneti

    Balanced scorecard implementation in an Italian Local Government Organization

    Public Money Manage.

    (2009)
  • J. Griffiths

    Balanced scorecard use in New Zealand Government Departments and Crown Entities

    Aust. J. Public Admin.

    (2003)
  • J.-W. Guo et al.

    Mining Twitter to explore the emergence of COVID-19 symptoms

    Public Health Nurs.

    (2020)
  • A.-T.-K. Ho et al.

    Have cities shifted to outcome-oriented performance reporting?—A content analysis of city budgets

    Public Budget. Finance

    (2005)
  • Z. Hoque et al.

    The rise and use of balanced scorecard measures in Australian government departments

    Finan. Account. Manage.

    (2011)
  • A.L. Hughes et al.

    Twitter adoption and use in mass convergence and emergency events

    Int. J. Emergency Manage.

    (2009)
  • Hutto, C.J., Gilbert, E., 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text, in:...
  • Cited by (18)

    View all citing articles on Scopus
    View full text