Co-occurrence networks of Twitter content after manual or automatic processing. A case-study on “gluten-free”

doi:10.1016/j.foodqual.2020.103993

Food Quality and Preference

Volume 86, December 2020, 103993

https://doi.org/10.1016/j.foodqual.2020.103993 Get rights and content

Highlights

•
Co-occurrence networks of tweets after manual coding and just cleaned were compared.
•
Cleaning and coding text provided networks with similar structure and terms relevance.
•
Most tweets on gluten free mention products: bread, cake, cookie, beer, and pizza.
•
Users share how to get gluten-free products (buying or preparing) or eating situations.

Abstract

Gathering information from social networks such as Twitter has emerged to obtain spontaneous and direct opinions of users about a topic. This study focuses on using co-occurrence networks to analyse Twitter information. The objectives were to study the impact of text pre-treatment (codification based in qualitative analysis or just pre-cleaning) and to apply co-occurrence networks for analysing what is said on Twitter about specific topics like “gluten-free”. As such, 16,386 tweets in Spanish containing terms “sin-gluten” and “gluten-free” were collected. A subset of 3000 tweets was used to make co-occurrence networks two ways: i) from the manually coded text and ii) from pre-cleaned text. Results indicate that the co-occurrence network from pre-cleaned text provides meaningful information showing structure and relevance for terms like the network from coded text. The whole set of tweets was used to explore Twitter information on gluten-free, showing users share information about products, occasions, social situations, and places but also product characteristics, sensations, and diet or health issues related to the products. Five product categories, critical for the lack of gluten (bread, cake, cookie, beer, and pizza), occupied most tweets, and according to the related terms, were intended to recommend how to get (buying or cooking) these gluten-free products and to exhibit what (how, when, and where) they prepare and eat. These aspects were different among products, and separated co-occurrence networks allowed better identification.

Introduction

In recent years, an increase in consumer demand has been observed for gluten-free products (Christoph et al., 2018, Missbach et al., 2015, Molina-Rosell, 2013). Research on gluten-free products has focused on strategies dealing with the negative impact a lack of gluten has on the quality properties of these products. Manufacturing gluten-free cereal products is a challenging task for the food industry (Capriles et al., 2016, Houben et al., 2012). Besides, according to Naqash, Gani, Gani, and Masoodi (2017), most approaches include the addition of functional ingredients to the formulation (gluten-free flours, starches, hydrocolloids, proteins, fats, and fibres) or the adoption of alternative processing methods (high pressure, extrusion, and sourdough fermentation) to produce gluten-free products with good sensory quality, especially a texture comparable to those containing gluten (Marston et al., 2016, Matos and Rosell, 2012, O’Shea et al., 2013, Penjumras et al., 2019).

However, according to do Nascimento, Fiates, and Teixeira (2017), consumers concerns for gluten-free products include sensory quality of products and the issues they experience trying to have a “normal life”, especially in a social context. Still, information on the relevance of extrinsic properties of products, context aspects, and individual attitudes and opinions of gluten-free consumers is scarce. This is for the difficulty in finding coeliac participants for consumer studies, accounting for an estimated 1–2% of population (Sapone et al., 2012). Therefore, we believe what is said on social media networks may be a way to obtain opinions of this target group of consumers, allowing us to understand their motivations and interests when consuming gluten-free products.

Among social media platforms, Twitter is one of the most popular and dynamic microblogging services, with 500 million text-based messages, called “tweets”, generated by active users per day (Chae, 2015, Da Silva et al., 2014, Mention, 2018, Vidal et al., 2015). The informal and colloquial nature of tweets, together with the ease and instant access of the platform make its use widespread, giving rise to a huge volume of rapidly generated data (Fried et al., 2015, Moe and Schweidel, 2017). Unlike other opinion gathering methods for consumers (surveys), social media users spontaneously post what they want when they want, avoiding forced biases to express their opinion.

Food represents one of the key themes discussed on Twitter (Platania & Spadoni, 2018) and consequently, tweets are potentially valuable data sources for gaining insight on food-related consumer studies. To date, the exploration of user-generated content on Twitter has been useful to study food-related topics (food in general, influence of food choices, language of food, food chains, health food, different eating situations, and emotional responses to food and beverages) (Chen and Yang, 2014, Fried et al., 2015, He et al., 2013, Platania and Spadoni, 2018, Samoggia et al., 2019, Vidal et al., 2015, Vidal et al., 2016). However, no study has addressed the exploration and interpretation of a topic like gluten-free.

Different approaches have been made to analyse tweets; automatic word counting is the simplest method of gathering information from users. Calculating the frequency or occurrence of mentions for an individual word, is simple and rapid for summarising the text according to the terms that are frequently mentioned. Nevertheless, the frequency of occurrence of individual words has several important limitations. It may not represent the meaning of the word isolated in the dataset and can lead to misleading conclusions because of the loss of the words’ context (Hsieh and Shannon, 2005, Vidal et al., 2015, Zhao et al., 2013). Therefore, previous qualitative analysis of tweet contents, with individual reading, was proposed to analyse tweets in the context of which the words are mentioned. Thus, the content was classified into themes and sub-themes related to the specific topic (Nguyen et al., 2019, Platania and Spadoni, 2018, Samoggia et al., 2019, Vidal et al., 2015). Although implementing manual content analysis can be tedious and time consuming for the large amounts of text to be read, it proved to be successful at gaining better interpretation of Twitter content (He et al., 2013, He et al., 2017, Vidal et al., 2015). As an automatic alternative, text analysis based on machine learning algorithms has been used to extract meaningful information from the textual data, recording themes already established or commonly studied (Constantinides and Holleschovsky, 2016, Sengupta and Ghosh, 2020, van Zoonen and van Der Meer, 2016). However, for the correct performance of these models, machine learning algorithms usually require a large external source of coded dataset to analyse the text units (Vidal, Ares, & Jaeger, 2018). Thus, the development and adjustment of the algorithms for new topics is complex or require added information.

Co-occurrence networks have been proposed as an approach to facilitate the understanding and visualisation of the structure of different text items and their content. Co-occurrence networks graphically represent the relevance of terms and the relatedness among them, identifying and displaying patterns of co-occurrence within the text (Ruiz and Barnett, 2015, Su and Lee, 2010). Although broadly applied in studies of bibliometric analysis to identify and visualise the existing connections among data (Skaf et al., 2020, van Eck and Waltman, 2018, Wen et al., 2017), co-occurrence networks can also be used for exploring connections of terms in different text documents.

Co-occurrence networks can be obtained by specific software as VOSviewer and Gephi or by using the Python programming language. In the VOSviewer software used in this study, the construction of a map comprises three steps: i) A similarity matrix (association strength as a measure of similarity) is obtained from a co-occurrence matrix (van Eck and Waltman, 2007, van Eck et al., 2006). The similarity between two terms is calculated as the ratio: the number of co-occurrences of two terms i and j divided by the product of the total number of co-occurrences of i and the total number of co-occurrences of j. ii) The visualisation of similarities (VOS) mapping technique constructs a two-dimensional map in which the items are located in such a way that the distance between any pair of items reflects their similarity. The base for doing so is minimising a weighted sum of the squared Euclidean distances between all pairs of items. The higher the similarity between two items, the higher the weight of their squared distance in the sum. iii) The obtained map is translated, rotated, and reflected to obtain consistent results (always the same map) regardless of the different solutions that can be reached in the optimisation process.

In the obtained network, the size of the label representing a term is proportional to its frequency of appearance in the text (occurrence). The thickness of the line connecting two terms indicates how often they co-occur within the same text unit. The distance between two terms offers an approximate indication of the relatedness of the terms (Cunillera and Guilera, 2018, Marinho et al., 2017, Sharma et al., 2018, van Eck et al., 2010). A dataset of 70 text documents describing flowers has been created to illustrate the explanation (Fig. 1). The table in the figure includes the occurrences and co-occurrences of the seven terms. Each term is represented in the network as a circle of size proportional to the number of occurrences; for example, the labels and circles of terms red colour and pink colour are the largest and the smallest because they are the most and least mentioned terms, respectively. Distribution of terms on the map responds to the relationships between items. Spring was a general term, co-mentioned with many terms (red, rose, poppy, and jasmine) and thus, appears located in the centre. The links with the four terms have the same thickness because the number of co-occurrences is the same (five). Terms that do not show co-occurrences among them are separated in the extremes, and close to the terms with higher co-occurrence. In the top-left appears the term poppy related to red, while in the bottom right, the term jasmine relates to fragrance. Rose shows a strong link with red and fragrance, and appears in the bottom-left. The term pink links to rose but does not show co-occurrence with any other term, appearing separated on the bottom-left extreme of the network.

In this study, we propose using co-occurrence networks as a tool for analysing terms in tweets to give more structured information than word counting. Using of raw text directly from tweets would make the analysis almost automatic, however, a previous qualitative analysis of tweets and the corresponding coding of text could be necessary to provide the relevance of ideas expressed in many different terms to avoid misunderstanding of the text.

Therefore, the first aim was to study how pre-processing of tweet text (coding through qualitative analysis or just pre-cleaning) influences co-occurrence networks to determine if the process can be automated without losing relevant information. The second aim was to analyse tweets about “gluten-free” to gather information about the aspects that are relevant for this specific group of consumers, in general and in relation to specific products.

Section snippets

Retrieval of tweets

A total of 16,386 tweets containing “sin-gluten” or “gluten-free” terms posted by users writing in Spanish, between September 2017 and January 2018, were retrieved with the rtweet package (Kearney, 2017) from R software (R Core Team, 2016) via the Twitter’s Application Programming Interface (API). Re-tweets and repeated tweets were removed. Each retrieved tweet included an ID number, username of the person posting and the date and time when the tweet was published, among other information.

Themes and sub-themes in tweets on gluten-free

Table 1 shows the content of the subset of 3000 tweets summarised into nine main themes: products, places, culinary preparations, product-related characteristics, ingredients, occasions, social context, diet/health, and sensory characteristics/sensations. Tweets relating to the themes product, places, and culinary preparation were the most frequent (>30%). Although with lower frequency (<10%), other themes related to diet or health issues and sensory characteristics/sensations were also found.

Treatment of the information obtained from Twitter

In this study, the analysis of tweet information was conducted using co-occurrence networks. The tweets were pre-processed in two ways, either manually coding the content of tweet or using direct raw tweet text (after just a pre-cleaned step). The networks from the manual coded and pre-cleaned text from the subset of 3000 tweets revealed similar main ideas about the topic “gluten-free” but they were differently represented. When coding the information after reading, concepts and ideas were

Conclusions

Co-occurrence networks allow the understanding of the information on Twitter showing the relevance of terms and how they are structured through co-occurrence connections.

This study shows that co-occurrence networks can be used, almost directly, from pre-cleaned data without losing relevant information. Furthermore, the study highlighted the importance of the number of tweets when making relevant and dependable information.

This approach almost automatic based on co-occurrence networks from

CRediT authorship contribution statement

Patricia Puerta: Investigation, Formal analysis, Writing - original draft. Laura Laguna: Investigation, Writing - review & editing, Writing - original draft. Leticia Vidal: Methodology, Writing - review & editing, Writing - original draft. Gastón Ares: Conceptualization, Methodology, Writing - review & editing, Writing - original draft. Susana Fiszman: Conceptualization, Writing - review & editing, Writing - original draft. Amparo Tárrega: Conceptualization, Formal analysis, Methodology,

Acknowledgements

Authors are grateful to the Spanish Ministry of the Economy and Competitiveness for financial support (project AGL-2016-75403-R) and for the Juan de la Cierva contract for author Laura Laguna (IJCI-2016-27427). Furthermore, to Generalitat Valenciana (Project Prometeo 2017/189). Authors are also grateful to Dr. Waltman for his valuable advice related to using VOSviewer technique.

References (61)

A. Arellano-Covarrubias et al.
Connecting flavors in social media: A cross cultural study with beer pairing
Food Research International
(2019)
K.G. Blackburn et al.
Food for thought: Exploring how people think and talk about food online
Appetite
(2018)
V.D. Capriles et al.
Gluten-free breadmaking: Improving nutritional and bioactive compounds
Journal of Cereal Science
(2016)
J. Carr et al.
Social media in product development
Food Quality and Preference
(2015)
B. Chae
Insights from hashtag #supplychain and Twitter analytics: Considering Twitter and Twitter data for supply chain practice and research
International Journal of Production Economics
(2015)
X. Chen et al.
Does food environment influence food choices? A geographical analysis through “tweets”
Applied Geography
(2014)
M.J. Christoph et al.
Who values gluten-free? Dietary intake, behaviors, and sociodemographic characteristics of young adults who value gluten-free food
Journal of the Academy of Nutrition and Dietetics
(2018)
N.F. Da Silva et al.
Tweet sentiment analysis with classifier ensembles
Decision Support Systems
(2014)
A.B. do Nascimento et al.
We want to be normal! Perceptions of a group of Brazilian consumers with coeliac disease on gluten-free bread buns
International Journal of Gastronomy and Food Science
(2017)
W. He et al.
Social media competitive analysis and text mining: A case study in the pizza industry
International Journal of Information Management
(2013)

G.J. Kang et al.

Semantic network analysis of vaccine sentiment in online social media

Vaccine

(2017)

K. Marston et al.

Effect of heat treatment of sorghum flour on the functional properties of gluten-free bread and cake

LWT - Food Science and Technology

(2016)

F. Naqash et al.

Gluten-free baking: Combating the challenges – A review

Trends in Food Science and Technology

(2017)

B. Piqueras-Fiszman et al.

Emotions associated to mealtimes: Memorable meals and typical evening meals

Food Research International

(2015)

J.B. Ruiz et al.

Exploring the presentation of HPV information online: A semantic network analysis of websites

Vaccine

(2015)

L. Skaf et al.

Applying network analysis to explore the global scientific literature on food security

Ecological Informatics

(2020)

S. Spinelli et al.

Investigating preferred coffee consumption contexts using open-ended questions

Food Quality and Preference

(2017)

W. van Zoonen et al.

Social media research: The application of supervised machine learning in organizational communication research

Computers in Human Behavior

(2016)

L. Vidal et al.

Use of emoticon and emoji in tweets for food-related emotional expression

Food Quality and Preference

(2016)

L. Vidal et al.

Using Twitter data for food-related consumer research: A case study on “what people say when tweeting about different eating situations”

Food Quality and Preference

(2015)

B. Zhao et al.

Identification of collective viewpoints on microblogs

Data and Knowledge Engineering

(2013)

F.M. Begen et al.

Consumer preferences for written and oral information about allergens when eating out

PLoS One

(2016)

Constantinides, E., & Holleschovsky, N. I. (2016). Impact of online product reviews on purchasing decisions. WEBIST...

T. Cunillera et al.

Twenty years of statistical learning: From language, back to machine learning

Scientometrics

(2018)

K. Eriksson-Backa et al.

Communicating diabetes and diets on Twitter – a semantic content analysis

International Journal of Networking and Virtual Organisations

(2016)

Feinerer, I., & Hornik, K. (2017). Package “tm”: Text Mining Package. Version 0.7-1. CRAN.R-Project....

D. Fried et al.

Analyzing the language of food on social media

Gómez-Corona, C., Ares, G., Spinelli, S., Veflen, N., & Stathopoulou, N. (2019). Social media in sensory and consumer...

R.J.T. Hamshaw et al.

Tweeting and eating: The effect of links and likes on food-hypersensitive consumers’ perceptions of Tweets

Frontiers in Public Health

(2018)

W. He et al.

Application of social media analytics: A case of analyzing online hotel reviews

Online Information Review

(2017)

Cited by (20)

A critical review of social media research in sensory-consumer science
2023, Food Research International
The collection and analysis of digital data from social media is a rapidly growing methodology in sensory-consumer science, with a wide range of applications for research studying consumer attitudes, preferences, and sensory responses to food. The aim of this review article was to critically evaluate the potential of social media research in sensory-consumer science with a focus on advantages and disadvantages. This review began with an exploration into different sources of social media data and the process by which data from social media is collected, cleaned, and analyzed through natural language processing for sensory-consumer research. It then investigated in detail the differences between social media-based and conventional methodologies, in terms of context, sources of bias, the size of data sets, measurement differences, and ethics. Findings showed participant biases are more difficult to control using social media approaches, and precision is inferior to conventional methods. However, findings also showed social media methodologies may have other advantages including an increased ability to investigate trends over time and easier access to cross-cultural or global insights. Greater research in this space will identify when social media can best function as an alternative to conventional methods, and/or provide valuable complementary information.
Coeliac consumers’ expectations and eye fixations on commercial gluten-free bread packages
2022, LWT
Citation Excerpt :
Our results suggest that they fixated on the ingredients like other consumers suffering food intolerances, who frequently review the ingredients on food labels to avoid allergens (Cochrane, Gowland, Sheffield, & Crevel, 2013). However, coeliac consumers that are used to cook and bake at home, can recognize and appreciate the different flours used in gluten-free products (Puerta et al., 2020), thus fixations on the list of ingredients might be directed to the flour type to decide how much they would like the bread. This study presents some limitations.
The aim of this work was to investigate coeliac consumers' expected acceptability and trust in commercial bread packages showing different brands and gluten-free claims in relation to their gaze fixations when observing the package. For that, ten commercial gluten-free breads were used (varying in the brand and presence of certification logo). Eighty-six coeliac consumers or relatives rated expected acceptability and trust of each bread, and eye-tracking was used to register the number of fixations on different elements of packages. Brand affected expected acceptability, being higher for breads from specific gluten-free brands. Trust conferred was high for all breads. Certification logo did not affect trust of consumers, but conditioned their fixations: when logo was not present, they looked more at the ingredients or nutritional facts. Both factors (brand and certification logo) showed to affect coeliac consumers’ response to gluten-free food, such as expected acceptability, trust and how they looked at the different package elements.
Exploring public perceptions on alternative meat in China from social media data using transfer learning method
2022, Food Quality and Preference
Citation Excerpt :
Most studies focus on simple word counting method based on frequency and occurrence (Carr et al., 2015), content analysis (Danner & Menapace, 2020; Jaeger & Rasmussen, 2021; Vidal, Ares, & Jaeger, 2016) which is very time-consuming especially for large amount of data, or a combination of both (Vidal, Ares, Machín, & Jaeger, 2015). Other studies investigate the usage of semantic networks (Grebitus & Bruhn, 2008), co-occurrence network (Puerta et al., 2020), and concept mapping approach (Peschel, Kazemi, Liebichová, Sarraf, & Aschemann-Witzel, 2019) for the analysis of associations and communications. Also, there are novel points like considering the use of emoticon and emoji in evaluating emotions (Jaeger, Roigard, & Ares, 2018, Jaeger, Vidal, & Ares, 2021; Jaeger, Lee, et al., 2017; Jaeger, Vidal, Kam, & Ares, 2017; Jaeger & Ares, 2017; Vidal et al., 2016, Vidal, Ares, Blond, Jin, & Jaeger, 2020) for tweets and open-ended questions.
The emerging social media serves as a complementary source for consumer behavior analysis with spontaneous data it generates. However, most studies employ time-consuming content analysis or lexical sentiment analysis. Considering the richness of data and progress of data science, in this paper, we propose a transfer learning based method to explore public attitudes towards alternative meat (AM) using data from social media in China to provide an alternative perspective. We compare traditional machine learning models: Naive Bayes and Support Vector Machine with our BERT-based Alternative Meat (BAM) model on the annotated sample. BAM model outperforms others in terms of macro F1 score and accuracy and is employed on the whole dataset later. The sentiment analysis result shows that among 41782 related posts we accumulated, about 42.10% of posts are personal posts consisting of negative, neutral, and positive feelings towards AM with a proportion of 28.77%, 22.91%, and 48.32% respectively. It is less promising compared with the consensus previous studies reach that over half of the Chinese people are positive and few Chinese are negative towards AM. Our findings add to the blooming body of studies suggesting the relationship of people’s willingness to try or purchase AM and factors including gender, geography, price, veganism, and food safety. Conspiracy theory is identified for the first time as the main reason for opposition to AM among Chinese consumers. Instead of the booster, traditional vegetarian substitutes especially tofu turn out to be an obstacle for accepting AM with much resemblances.
Relevant elements on biscuits purchasing decision for coeliac children and their parents in a supermarket context
2022, Food Quality and Preference
Citation Excerpt :
The list of ingredients was also more relevant for coeliac participants. The type of flour can provide information to check the suitability of the product but also about its sensory quality, as coeliac consumers are concerned or interested about alternative flours for elaborating gluten-free products (Puerta, Laguna, Vidal, Ares, Fiszman, & Tárrega, 2020). However, healthiness seems to be the reason behind this attention to the list of ingredients, as consumers only refer to this element in the laddering task to declare choosing the biscuits with fewer ingredients because they are good for their children’s health.
The aim of this work was to study the behaviour and motivations of coeliac children and their parents when purchasing biscuits. Four groups (n = 30) of participants differing in coeliac condition (coeliac and non-coeliac) and age (children and parents) were studied. Participants were asked to “purchase” biscuits, either for themselves (children) or for their children (parents), in a simulated supermarket aisle that included twelve commercial biscuits (six gluten-free and six regular ones). Eye-tracking technique was used to register visual attention during the purchasing exercise and laddering interviews were used to obtain the self-reported reasons for their choice. The number of fixations received by biscuits and label elements were analysed and most of them varied depending on the coeliac condition, the age or both. In comparison with the non-coeliac children, coeliac children fixated more on the ingredients, gluten-free words and symbols, and fixated less on the biscuit image. Parents of coeliac children put more attention on the ingredients and the certified gluten-free symbol, and less attention on the biscuit image, product name, cartoon, and nutritional information than non-coeliac parents. According to the chains of reasons (attribute-consequence-value), all children looked for pleasure as the final value, but only coeliac children showed interest in the brand and in unknown products they want to try. Parents differed on the attributes linked to health that were certification logo and a short ingredient list for coeliac group, and low sugar or fat contents for non-coeliac one. Trust and economy were relevant only for parents of coeliac children.
Risk assessment method combining complex networks with MCDA for multi-facility risk chain and coupling in UUS
2022, Tunnelling and Underground Space Technology
With the acceleration of urban development, the urban underground space (UUS) begins to show the characteristics of a complex giant system. In most of the traditional risk assessment frameworks, metrics are developed by treating the object as a whole. There is a lack of applicable objective-based assessment methods when there are correlations and network structures among the internal structures of the object. Here we propose and test a method, a comprehensive risk assessment model E-CN-TOPSIS applicable to UUS, to solve this problem. First, an initial risk assessment framework with 4 primary indicators and 30 sub-indicators was proposed based on Multi-Criteria Decision Analysis (MCDA). Second, a risk chain analysis network of UUS containing 201 risk nodes based on real cases was constructed. The principal component analysis method was taken to quantify the chain-coupled risk mechanism of multi-facility within UUS to integrate the external risk assessment framework with the internal risk chain network. Then we analyzed 16 main cities in three major economic circles of China. Our analyses indicate that ‘Recovery capability’ was the priority in all evaluation indicators. The main factors influencing the UUS risk in three major economic circles are management policies. The UUS risk is highest in Greater Bay Area, followed by Jing-Jin-Ji, and Yangtze River Delta. Finally, some outlooks and suggestions are given. By applying this methodology, useful information can be provided for the fine planning and precise prevention of new complex UUS risks.
Importance of data preparation when analysing written responses to open-ended questions: An empirical assessment and comparison with manual coding
2021, Food Quality and Preference
Citation Excerpt :
Data in food-related text mining studies come from a variety of sources. Since food is a popular theme on Twitter (Platania & Spadoni, 2018), tweets are regularly analysed (e.g., Puerta et al., 2020; Singh, Shukla, & Mishra, 2018; Vidal, Ares, Machín, & Jaeger, 2015). Online reviews and forums also provide raw data (e.g., Snyder & Barzilay, 2007; Moon & Kamakura, 2017; Nakayama & Wan, 2018; Agüero-Torales, Cobo, Herrera-Viedma, & López-Herrera, 2019; Danner & Menapace, 2020; Hamilton & Lahne, 2020).
In a world where consumer texts grow more numerous each day, automated text analysis can deliver valuable insights about consumer attitudes and behaviours. The present research was methodological in nature and focused on pre-processing of text data, which generally is the most time-consuming stage of analysis. Using responses to an open-ended question from 4341 consumers, document-term matrices (DTM) were created from varying combinations of n-grams (unigrams, bigrams, trigrams and combinations hereof), stemming (yes or no) and low-frequency term thresholding (retaining all terms or excluding those used < 0.1%, <1% or < 5%). By comparison to a fixed standard – manually derived content coded of respondents’ answers – the relative impact of the three pre-processing steps were assessed. PLS-DA was used to do so, and classifier performance was evaluated using AUC-ROC scores. Inclusion of bigrams and trigrams in DTMs did not influence classification performance and stemming had only a minor impact. Inclusion of all and very rare features (<0.1%) improved classification performance. The results were invariant of sample size and replicated in subsets of 2000, 1000 and 500 participants. The results may be specific to the short length of the answers (median words = 4), although they held in a sub-sample of the 500 longest answers (median words = 41). Future research should directly test the influence of these pre-processing steps, for example, through topic modelling.

View all citing articles on Scopus

View full text

Co-occurrence networks of Twitter content after manual or automatic processing. A case-study on “gluten-free”

Highlights

Abstract

Introduction

Section snippets

Retrieval of tweets

Themes and sub-themes in tweets on gluten-free

Treatment of the information obtained from Twitter

Conclusions

CRediT authorship contribution statement

Acknowledgements

Food Research International

Appetite

Journal of Cereal Science

Food Quality and Preference

International Journal of Production Economics

Applied Geography

Journal of the Academy of Nutrition and Dietetics

Decision Support Systems

International Journal of Gastronomy and Food Science

International Journal of Information Management

Vaccine

LWT - Food Science and Technology

Trends in Food Science and Technology

Food Research International

Vaccine

Ecological Informatics

Food Quality and Preference

Computers in Human Behavior

Food Quality and Preference

Food Quality and Preference

Data and Knowledge Engineering

Consumer preferences for written and oral information about allergens when eating out

PLoS One

Twenty years of statistical learning: From language, back to machine learning

Scientometrics

Communicating diabetes and diets on Twitter – a semantic content analysis

International Journal of Networking and Virtual Organisations

Analyzing the language of food on social media

Tweeting and eating: The effect of links and likes on food-hypersensitive consumers’ perceptions of Tweets

Frontiers in Public Health

Application of social media analytics: A case of analyzing online hotel reviews

Online Information Review