Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information

Donelson, Curtis; Sutter, Carolyn; Pham, Giang V.; Narang, Kanika; Wang, Chen; Yun, Joseph T.

doi:10.1007/s10826-021-01923-5

Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information

S.I. : Artificial Intelligence and Machine Learning
Published: 27 February 2021

Volume 30, pages 1290–1298, (2021)
Cite this article

Journal of Child and Family Studies Aims and scope Submit manuscript

Curtis Donelson ORCID: orcid.org/0000-0002-4731-1880¹,
Carolyn Sutter¹,
Giang V. Pham¹,
Kanika Narang¹,
Chen Wang¹ &
…
Joseph T. Yun¹

402 Accesses
Explore all metrics

Abstract

The current research used human-coded Reddit posts categorized by already established food parenting concepts (coercive control, structure, autonomy support, recipes) as a basis for machine learning models, with the objective of providing insight into topics related to feeding children discussed on social media and to provide a way for future research to use our trained machine-learned model. Reddit posts from specific, parenting-related subreddits were collected and labeled as they related to aspects of child-feeding behavior. Posts were then put through text pre-processing, converted into TF-IDF vectors, and used to train support vector machine binary and multiclass classification models. Other classifiers and text-preprocessing steps were also tested. After training, the binary model was able to classify posts with 86.1% accuracy as being about child feeding or not, up from a baseline accuracy of 57.6%. The multiclass model yielded a 79.1% accuracy to classify posts related to four categories of child feeding concepts (coercive control, autonomy support, structure, recipes), up from a baseline of 51.9%. The comparison models were found to perform less favorably. The best performing binary model is publicly available for use via the Social Media Macroscope and we provide details on how to use this model. Information is presented such that other researchers and professionals interested in examining issues related to feeding children posted on social media could effectively utilize the same approach.

Highlights

Machine learning models based on human-coded Reddit posts were developed.
The binary model can classify posts as being about child feeding with 86.1% accuracy.
The multiclass model can classify child feeding concepts in posts with 79.1% accuracy.
The best performing binary model is made available for use with instructions provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

“Baby Wants Tacos”: Analysis of Health-Related Facebook Posts from Young Pregnant Women

Article 20 June 2019

Multi-label classification and knowledge extraction from oncology-related content on online social networks

Article 17 April 2020

Automatic Classification of Forum Posts: A Finnish Online Health Discussion Forum Case

References

Ammari, T., Schoenebeck, S., & Romero, D.M. (2018). Pseudonymous parents: comparing parenting roles and identities on the Mommit and Daddit subreddits. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 489): ACM.
Baker, S., Sanders, M. R., & Morawska, A. (2017). Who uses online parenting support? A cross-sectional survey exploring Australian parents’ internet use for parenting. Journal of Child and Family Studies, 26(3), 916–927.
Article Google Scholar
Bartholomew, M. K., Schoppe-Sullivan, S. J., Glassman, M., Kamp Dush, C. M., & Sullivan, J. M. (2012). New parents’ Facebook use at the transition to parenthood. Family Relations, 61(3), 455–469.
Article Google Scholar
Bellmore, A., Calvin, A. J., Xu, J.-M., & Zhu, X. (2015). The five W’s of “bullying” on Twitter: Who, what, why, where, and when. Computers in Human Behavior, 44, 305–314.
Article Google Scholar
Boe, B. (2015). PRAW: The Python Reddit API Wrapper.
Bridges, N., Howell, G., & Schmied, V. (2018). Exploring breastfeeding support on social media. International Breastfeeding Journal, 13(1), 22.
Article Google Scholar
Chen, T., & Guestrin, C. (2016). Xgboost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM.
Chou, W.-Y. S., Oh, A., & Klein, W. M. (2018). Addressing health-related misinformation on social media. JAMA, 320(23), 2417–2418.
Article Google Scholar
Duggan, M., Lenhart, A., Lampe, C., & Ellison, N.B. (2015). Parents and social media, (1–37). Pew Research Center.
Dworkin, J., Rudi, J. H., & Hessel, H. (2018). The state of family research and social media. Journal of Family Theory & Review, 10(4), 796–813.
Article Google Scholar
Farhadloo, M., Winneg, K., Chan, M.-P. S., Jamieson, K. H., & Albarracin, D. (2018). Associations of topics of discussion on twitter with survey measures of attitudes, knowledge, and behaviors related to Zika: probabilistic study in the United States. JMIR Public Health and Surveillance, 4, 1.
Article Google Scholar
Fox, S. (2011). Health topics. Pew Internet and American Life Project.
Fox, S., & Duggan, M. (2013). Health online 2013. Washington, DC: Pew Internet & American Life Project, 1.
Google Scholar
Haslam, D. M., Tee, A., & Baker, S. (2017). The use of social media as a mechanism of social support in parents. Journal of Child and Family Studies, 26(7), 2026–2037.
Article Google Scholar
Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features European conference on machine learning (pp. 137–142): Springer.
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., et al. (2016). Jupyter Notebooks-a publishing format for reproducible computational workflows (pp. 87–90). ELPUB
Kotsiantis, S.B. (2007). Supervised machine learning: a review of classification techniques. Informatica, https://doi.org/10.1115/1.1559160.
Laws, R., Walsh, A. D., Hesketh, K. D., Downing, K. L., Kuswara, K., & Campbell, K. J. (2019). Differences between mothers and fathers of young children in their use of the internet to support healthy family lifestyle behaviors: cross-sectional study. Journal of Medical Internet Research, 21(1), e11454.
Article Google Scholar
Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. https://doi.org/10.3115/1118108.1118117.
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification AAAI-98 workshop on learning for text categorization. Vol. 752 (pp. 41–48): Citeseer.
Nielsen, D. (2016). Tree Boosting With XGBoost-Why Does XGBoost Win” Every” Machine Learning Competition?: NTNU.
Nowak, J., Taspinar, A., & Scherer, R. (2017). LSTM recurrent neural networks for short text and sentiment classification (pp. 553–562). Cham: Springer International Publishing.
Book Google Scholar
O’Connor, T. M., Mâsse, L. C., Tu, A. W., Watts, A. W., Hughes, S. O., & Beauchamp, M. R., et al. (2017). Food parenting practices for 5 to 12 year old children: a concept map analysis of parenting and nutrition experts input. International Journal of Behavioral Nutrition and Physical Activity, 14(1), 122.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., & Grisel, O., et al. (2012). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2.
Article Google Scholar
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning. Vol. 242, (pp. 133–142).
Savage, J. S., Fisher, J. O., & Birch, L. L. (2007). Parental influence on eating behavior: conception to adolescence. The Journal of Law, Medicine & Ethics, 35(1), 22–34.
Article Google Scholar
Vaughn, A. E., Ward, D. S., Fisher, J. O., Faith, M. S., Hughes, S. O., & Kremers, S. P., et al. (2015). Fundamental constructs in food parenting practices: a content map to guide future research. Nutrition Reviews, 74(2), 98–117.
Article Google Scholar
Viera, A.J., & Garrett, J.M. (2005). Understanding interobserver agreement: the kappa statistic. Family Medicine, 37(5), 360–363.
Yun, J.T., Duff, B.R.L., Vargas, P.T., Sundaram, H., & Himelboim, I. (2019a). Computationally analyzing social media text for topics: a primer for advertising researchers. Journal of Interactive Advertising, https://doi.org/10.1080/15252019.2019.1700851.
Yun, J.T., Vance, N., Wang, C., Marini, L., Troy, J., Donelson, C., et al. (2019b). The Social Media Macroscope: A science gateway for research using social media data. Future Generation Computer Systems, https://doi.org/10.1016/j.future.2019.10.029.

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, 601 E John St, Champaign, IL, 61820, USA
Curtis Donelson, Carolyn Sutter, Giang V. Pham, Kanika Narang, Chen Wang & Joseph T. Yun

Authors

Curtis Donelson
View author publications
You can also search for this author in PubMed Google Scholar
Carolyn Sutter
View author publications
You can also search for this author in PubMed Google Scholar
Giang V. Pham
View author publications
You can also search for this author in PubMed Google Scholar
Kanika Narang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph T. Yun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Curtis Donelson.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Table 1

Table 1 Mean performance metrics of classification models

Full size table

Appendix B

Figures 1, 2, 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Donelson, C., Sutter, C., Pham, G.V. et al. Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information. J Child Fam Stud 30, 1290–1298 (2021). https://doi.org/10.1007/s10826-021-01923-5

Download citation

Accepted: 07 February 2021
Published: 27 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10826-021-01923-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information

Abstract

Highlights

Access this article

Similar content being viewed by others

“Baby Wants Tacos”: Analysis of Health-Related Facebook Posts from Young Pregnant Women

Multi-label classification and knowledge extraction from oncology-related content on online social networks

Automatic Classification of Forum Posts: A Finnish Online Health Discussion Forum Case

References