Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information,Journal of Child and Family Studies

当前位置： X-MOL 学术 › J. Child Fam. Stud. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Using a Machine Learning Methodology to Analyze Reddit Posts regarding Child Feeding Information
Journal of Child and Family Studies ( IF 1.6 ) Pub Date : 2021-02-27 , DOI: 10.1007/s10826-021-01923-5
Curtis Donelson , Carolyn Sutter , Giang V. Pham , Kanika Narang , Chen Wang , Joseph T. Yun

The current research used human-coded Reddit posts categorized by already established food parenting concepts (coercive control, structure, autonomy support, recipes) as a basis for machine learning models, with the objective of providing insight into topics related to feeding children discussed on social media and to provide a way for future research to use our trained machine-learned model. Reddit posts from specific, parenting-related subreddits were collected and labeled as they related to aspects of child-feeding behavior. Posts were then put through text pre-processing, converted into TF-IDF vectors, and used to train support vector machine binary and multiclass classification models. Other classifiers and text-preprocessing steps were also tested. After training, the binary model was able to classify posts with 86.1% accuracy as being about child feeding or not, up from a baseline accuracy of 57.6%. The multiclass model yielded a 79.1% accuracy to classify posts related to four categories of child feeding concepts (coercive control, autonomy support, structure, recipes), up from a baseline of 51.9%. The comparison models were found to perform less favorably. The best performing binary model is publicly available for use via the Social Media Macroscope and we provide details on how to use this model. Information is presented such that other researchers and professionals interested in examining issues related to feeding children posted on social media could effectively utilize the same approach.

中文翻译：

使用机器学习方法来分析有关儿童喂养信息的Reddit帖子

当前的研究使用了人类编码的Reddit帖子，这些帖子按照已经建立的食品育儿概念（强制控制，结构，自主权支持，食谱）进行分类，以此作为机器学习模型的基础，目的是深入了解与在社会上讨论的喂养儿童相关的主题媒体，并为将来的研究提供一种使用我们训练有素的机器学习模型的方式。来自与父母相关的特定子reddit的reddit帖子被收集并标记为与儿童喂养行为的各个方面有关。然后对帖子进行文本预处理，将其转换为TF-IDF向量，并用于训练支持向量机的二进制和多类分类模型。还测试了其他分类器和文本预处理步骤。训练后，该二元模型能够对86个帖子进行分类。不论是否喂养儿童，其准确度为1％，高于基线准确度的57.6％。多类别模型产生的准确率达到79.1％，可以对与四类儿童喂养概念（强制控制，自治支持，结构，食谱）相关的职位进行分类，而基线为51.9％。发现比较模型的表现较差。效果最好的二进制模型可通过社交媒体宏观范围公开使用，我们提供了有关如何使用此模型的详细信息。提供的信息使得对研究与社交媒体上发布的喂养儿童有关的问题感兴趣的其他研究人员和专业人员可以有效地利用相同的方法。从41.9％的基线提高了1％的准确性，可对与四类儿童喂养概念（强制控制，自治支持，结构，食谱）相关的职位进行分类。发现比较模型的表现较差。效果最好的二进制模型可通过社交媒体宏观范围公开使用，我们提供了有关如何使用此模型的详细信息。提供的信息使得对研究与社交媒体上发布的喂养儿童有关的问题感兴趣的其他研究人员和专业人员可以有效地利用相同的方法。从41.9％的基线提高了1％的准确性，可对与四类儿童喂养概念（强制控制，自治支持，结构，食谱）相关的职位进行分类。发现比较模型的表现较差。效果最好的二进制模型可通过社交媒体宏观范围公开使用，我们提供了有关如何使用此模型的详细信息。提供的信息使得对研究与社交媒体上发布的喂养儿童有关的问题感兴趣的其他研究人员和专业人员可以有效地利用相同的方法。效果最好的二进制模型可通过社交媒体宏观范围公开使用，我们提供了有关如何使用此模型的详细信息。提供的信息使得对研究与社交媒体上发布的喂养儿童有关的问题感兴趣的其他研究人员和专业人员可以有效地利用相同的方法。效果最好的二进制模型可通过社交媒体宏观范围公开使用，我们提供了有关如何使用此模型的详细信息。提供的信息使得对研究与社交媒体上发布的喂养儿童有关的问题感兴趣的其他研究人员和专业人员可以有效地利用相同的方法。

更新日期：2021-04-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文