Skip to main content
Log in

Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information

  • S.I. : Artificial Intelligence and Machine Learning
  • Published:
Journal of Child and Family Studies Aims and scope Submit manuscript

Abstract

The current research used human-coded Reddit posts categorized by already established food parenting concepts (coercive control, structure, autonomy support, recipes) as a basis for machine learning models, with the objective of providing insight into topics related to feeding children discussed on social media and to provide a way for future research to use our trained machine-learned model. Reddit posts from specific, parenting-related subreddits were collected and labeled as they related to aspects of child-feeding behavior. Posts were then put through text pre-processing, converted into TF-IDF vectors, and used to train support vector machine binary and multiclass classification models. Other classifiers and text-preprocessing steps were also tested. After training, the binary model was able to classify posts with 86.1% accuracy as being about child feeding or not, up from a baseline accuracy of 57.6%. The multiclass model yielded a 79.1% accuracy to classify posts related to four categories of child feeding concepts (coercive control, autonomy support, structure, recipes), up from a baseline of 51.9%. The comparison models were found to perform less favorably. The best performing binary model is publicly available for use via the Social Media Macroscope and we provide details on how to use this model. Information is presented such that other researchers and professionals interested in examining issues related to feeding children posted on social media could effectively utilize the same approach.

Highlights

  • Machine learning models based on human-coded Reddit posts were developed.

  • The binary model can classify posts as being about child feeding with 86.1% accuracy.

  • The multiclass model can classify child feeding concepts in posts with 79.1% accuracy.

  • The best performing binary model is made available for use with instructions provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ammari, T., Schoenebeck, S., & Romero, D.M. (2018). Pseudonymous parents: comparing parenting roles and identities on the Mommit and Daddit subreddits. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 489): ACM.

  • Baker, S., Sanders, M. R., & Morawska, A. (2017). Who uses online parenting support? A cross-sectional survey exploring Australian parents’ internet use for parenting. Journal of Child and Family Studies, 26(3), 916–927.

    Article  Google Scholar 

  • Bartholomew, M. K., Schoppe-Sullivan, S. J., Glassman, M., Kamp Dush, C. M., & Sullivan, J. M. (2012). New parents’ Facebook use at the transition to parenthood. Family Relations, 61(3), 455–469.

    Article  Google Scholar 

  • Bellmore, A., Calvin, A. J., Xu, J.-M., & Zhu, X. (2015). The five W’s of “bullying” on Twitter: Who, what, why, where, and when. Computers in Human Behavior, 44, 305–314.

    Article  Google Scholar 

  • Boe, B. (2015). PRAW: The Python Reddit API Wrapper.

  • Bridges, N., Howell, G., & Schmied, V. (2018). Exploring breastfeeding support on social media. International Breastfeeding Journal, 13(1), 22.

    Article  Google Scholar 

  • Chen, T., & Guestrin, C. (2016). Xgboost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM.

  • Chou, W.-Y. S., Oh, A., & Klein, W. M. (2018). Addressing health-related misinformation on social media. JAMA, 320(23), 2417–2418.

    Article  Google Scholar 

  • Duggan, M., Lenhart, A., Lampe, C., & Ellison, N.B. (2015). Parents and social media, (1–37). Pew Research Center.

  • Dworkin, J., Rudi, J. H., & Hessel, H. (2018). The state of family research and social media. Journal of Family Theory & Review, 10(4), 796–813.

    Article  Google Scholar 

  • Farhadloo, M., Winneg, K., Chan, M.-P. S., Jamieson, K. H., & Albarracin, D. (2018). Associations of topics of discussion on twitter with survey measures of attitudes, knowledge, and behaviors related to Zika: probabilistic study in the United States. JMIR Public Health and Surveillance, 4, 1.

    Article  Google Scholar 

  • Fox, S. (2011). Health topics. Pew Internet and American Life Project.

  • Fox, S., & Duggan, M. (2013). Health online 2013. Washington, DC: Pew Internet & American Life Project, 1.

    Google Scholar 

  • Haslam, D. M., Tee, A., & Baker, S. (2017). The use of social media as a mechanism of social support in parents. Journal of Child and Family Studies, 26(7), 2026–2037.

    Article  Google Scholar 

  • Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features European conference on machine learning (pp. 137–142): Springer.

  • Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., et al. (2016). Jupyter Notebooks-a publishing format for reproducible computational workflows (pp. 87–90). ELPUB

  • Kotsiantis, S.B. (2007). Supervised machine learning: a review of classification techniques. Informatica, https://doi.org/10.1115/1.1559160.

  • Laws, R., Walsh, A. D., Hesketh, K. D., Downing, K. L., Kuswara, K., & Campbell, K. J. (2019). Differences between mothers and fathers of young children in their use of the internet to support healthy family lifestyle behaviors: cross-sectional study. Journal of Medical Internet Research, 21(1), e11454.

    Article  Google Scholar 

  • Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. https://doi.org/10.3115/1118108.1118117.

  • McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification AAAI-98 workshop on learning for text categorization. Vol. 752 (pp. 41–48): Citeseer.

  • Nielsen, D. (2016). Tree Boosting With XGBoost-Why Does XGBoost Win” Every” Machine Learning Competition?: NTNU.

  • Nowak, J., Taspinar, A., & Scherer, R. (2017). LSTM recurrent neural networks for short text and sentiment classification (pp. 553–562). Cham: Springer International Publishing.

    Book  Google Scholar 

  • O’Connor, T. M., Mâsse, L. C., Tu, A. W., Watts, A. W., Hughes, S. O., & Beauchamp, M. R., et al. (2017). Food parenting practices for 5 to 12 year old children: a concept map analysis of parenting and nutrition experts input. International Journal of Behavioral Nutrition and Physical Activity, 14(1), 122.

    Article  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., & Grisel, O., et al. (2012). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2.

    Article  Google Scholar 

  • Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning. Vol. 242, (pp. 133–142).

  • Savage, J. S., Fisher, J. O., & Birch, L. L. (2007). Parental influence on eating behavior: conception to adolescence. The Journal of Law, Medicine & Ethics, 35(1), 22–34.

    Article  Google Scholar 

  • Vaughn, A. E., Ward, D. S., Fisher, J. O., Faith, M. S., Hughes, S. O., & Kremers, S. P., et al. (2015). Fundamental constructs in food parenting practices: a content map to guide future research. Nutrition Reviews, 74(2), 98–117.

    Article  Google Scholar 

  • Viera, A.J., & Garrett, J.M. (2005). Understanding interobserver agreement: the kappa statistic. Family Medicine, 37(5), 360–363.

  • Yun, J.T., Duff, B.R.L., Vargas, P.T., Sundaram, H., & Himelboim, I. (2019a). Computationally analyzing social media text for topics: a primer for advertising researchers. Journal of Interactive Advertising, https://doi.org/10.1080/15252019.2019.1700851.

  • Yun, J.T., Vance, N., Wang, C., Marini, L., Troy, J., Donelson, C., et al. (2019b). The Social Media Macroscope: A science gateway for research using social media data. Future Generation Computer Systems, https://doi.org/10.1016/j.future.2019.10.029.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Curtis Donelson.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Table 1

Table 1 Mean performance metrics of classification models

Appendix B

Figures 1, 2, 3

Fig. 1
figure 1

Multi-tier coding schema for reddit posts

Fig. 2
figure 2

TF-IDF vectorization of reddit posts. *Sentence and TF-IDF values are for illustrative purposes only

Fig. 3
figure 3

Binary classification output from the social media macroscope. *Class 0 represents the “Not relevant” label. Class 1 represents the “Relevent” label

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Donelson, C., Sutter, C., Pham, G.V. et al. Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information. J Child Fam Stud 30, 1290–1298 (2021). https://doi.org/10.1007/s10826-021-01923-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10826-021-01923-5

Keywords

Navigation