Abstract
The current research used human-coded Reddit posts categorized by already established food parenting concepts (coercive control, structure, autonomy support, recipes) as a basis for machine learning models, with the objective of providing insight into topics related to feeding children discussed on social media and to provide a way for future research to use our trained machine-learned model. Reddit posts from specific, parenting-related subreddits were collected and labeled as they related to aspects of child-feeding behavior. Posts were then put through text pre-processing, converted into TF-IDF vectors, and used to train support vector machine binary and multiclass classification models. Other classifiers and text-preprocessing steps were also tested. After training, the binary model was able to classify posts with 86.1% accuracy as being about child feeding or not, up from a baseline accuracy of 57.6%. The multiclass model yielded a 79.1% accuracy to classify posts related to four categories of child feeding concepts (coercive control, autonomy support, structure, recipes), up from a baseline of 51.9%. The comparison models were found to perform less favorably. The best performing binary model is publicly available for use via the Social Media Macroscope and we provide details on how to use this model. Information is presented such that other researchers and professionals interested in examining issues related to feeding children posted on social media could effectively utilize the same approach.
Highlights
-
Machine learning models based on human-coded Reddit posts were developed.
-
The binary model can classify posts as being about child feeding with 86.1% accuracy.
-
The multiclass model can classify child feeding concepts in posts with 79.1% accuracy.
-
The best performing binary model is made available for use with instructions provided.
Similar content being viewed by others
References
Ammari, T., Schoenebeck, S., & Romero, D.M. (2018). Pseudonymous parents: comparing parenting roles and identities on the Mommit and Daddit subreddits. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 489): ACM.
Baker, S., Sanders, M. R., & Morawska, A. (2017). Who uses online parenting support? A cross-sectional survey exploring Australian parents’ internet use for parenting. Journal of Child and Family Studies, 26(3), 916–927.
Bartholomew, M. K., Schoppe-Sullivan, S. J., Glassman, M., Kamp Dush, C. M., & Sullivan, J. M. (2012). New parents’ Facebook use at the transition to parenthood. Family Relations, 61(3), 455–469.
Bellmore, A., Calvin, A. J., Xu, J.-M., & Zhu, X. (2015). The five W’s of “bullying” on Twitter: Who, what, why, where, and when. Computers in Human Behavior, 44, 305–314.
Boe, B. (2015). PRAW: The Python Reddit API Wrapper.
Bridges, N., Howell, G., & Schmied, V. (2018). Exploring breastfeeding support on social media. International Breastfeeding Journal, 13(1), 22.
Chen, T., & Guestrin, C. (2016). Xgboost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM.
Chou, W.-Y. S., Oh, A., & Klein, W. M. (2018). Addressing health-related misinformation on social media. JAMA, 320(23), 2417–2418.
Duggan, M., Lenhart, A., Lampe, C., & Ellison, N.B. (2015). Parents and social media, (1–37). Pew Research Center.
Dworkin, J., Rudi, J. H., & Hessel, H. (2018). The state of family research and social media. Journal of Family Theory & Review, 10(4), 796–813.
Farhadloo, M., Winneg, K., Chan, M.-P. S., Jamieson, K. H., & Albarracin, D. (2018). Associations of topics of discussion on twitter with survey measures of attitudes, knowledge, and behaviors related to Zika: probabilistic study in the United States. JMIR Public Health and Surveillance, 4, 1.
Fox, S. (2011). Health topics. Pew Internet and American Life Project.
Fox, S., & Duggan, M. (2013). Health online 2013. Washington, DC: Pew Internet & American Life Project, 1.
Haslam, D. M., Tee, A., & Baker, S. (2017). The use of social media as a mechanism of social support in parents. Journal of Child and Family Studies, 26(7), 2026–2037.
Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features European conference on machine learning (pp. 137–142): Springer.
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., et al. (2016). Jupyter Notebooks-a publishing format for reproducible computational workflows (pp. 87–90). ELPUB
Kotsiantis, S.B. (2007). Supervised machine learning: a review of classification techniques. Informatica, https://doi.org/10.1115/1.1559160.
Laws, R., Walsh, A. D., Hesketh, K. D., Downing, K. L., Kuswara, K., & Campbell, K. J. (2019). Differences between mothers and fathers of young children in their use of the internet to support healthy family lifestyle behaviors: cross-sectional study. Journal of Medical Internet Research, 21(1), e11454.
Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. https://doi.org/10.3115/1118108.1118117.
McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification AAAI-98 workshop on learning for text categorization. Vol. 752 (pp. 41–48): Citeseer.
Nielsen, D. (2016). Tree Boosting With XGBoost-Why Does XGBoost Win” Every” Machine Learning Competition?: NTNU.
Nowak, J., Taspinar, A., & Scherer, R. (2017). LSTM recurrent neural networks for short text and sentiment classification (pp. 553–562). Cham: Springer International Publishing.
O’Connor, T. M., Mâsse, L. C., Tu, A. W., Watts, A. W., Hughes, S. O., & Beauchamp, M. R., et al. (2017). Food parenting practices for 5 to 12 year old children: a concept map analysis of parenting and nutrition experts input. International Journal of Behavioral Nutrition and Physical Activity, 14(1), 122.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., & Grisel, O., et al. (2012). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2.
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning. Vol. 242, (pp. 133–142).
Savage, J. S., Fisher, J. O., & Birch, L. L. (2007). Parental influence on eating behavior: conception to adolescence. The Journal of Law, Medicine & Ethics, 35(1), 22–34.
Vaughn, A. E., Ward, D. S., Fisher, J. O., Faith, M. S., Hughes, S. O., & Kremers, S. P., et al. (2015). Fundamental constructs in food parenting practices: a content map to guide future research. Nutrition Reviews, 74(2), 98–117.
Viera, A.J., & Garrett, J.M. (2005). Understanding interobserver agreement: the kappa statistic. Family Medicine, 37(5), 360–363.
Yun, J.T., Duff, B.R.L., Vargas, P.T., Sundaram, H., & Himelboim, I. (2019a). Computationally analyzing social media text for topics: a primer for advertising researchers. Journal of Interactive Advertising, https://doi.org/10.1080/15252019.2019.1700851.
Yun, J.T., Vance, N., Wang, C., Marini, L., Troy, J., Donelson, C., et al. (2019b). The Social Media Macroscope: A science gateway for research using social media data. Future Generation Computer Systems, https://doi.org/10.1016/j.future.2019.10.029.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Donelson, C., Sutter, C., Pham, G.V. et al. Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information. J Child Fam Stud 30, 1290–1298 (2021). https://doi.org/10.1007/s10826-021-01923-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10826-021-01923-5