P-NUT: Predicting NUTrient Content from Short Text Descriptions,Mathematics

当前位置： X-MOL 学术 › Mathematics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

P-NUT: Predicting NUTrient Content from Short Text Descriptions
Mathematics ( IF 2.3 ) Pub Date : 2020-10-16 , DOI: 10.3390/math8101811
Gordana Ispirova , Tome Eftimov , Barbara Koroušić Seljak

Assessing nutritional content is very relevant for patients suffering from various diseases, professional athletes, and for health reasons is becoming part of everyday life for many. However, it is a very challenging task as it requires complete and reliable sources. We introduce a machine learning pipeline for predicting macronutrient values of foods using learned vector representations from short text descriptions of food products. On a dataset used from health specialists, containing short descriptions of foods and macronutrient values: we generate paragraph embeddings, introduce clustering in food groups, using graph-based vector representations, that include food domain knowledge information, and train regression models for each cluster. The predictions are for four macronutrients: carbohydrates, fat, protein and water. The highest accuracy was obtained for carbohydrate predictions – 86%, compared to the baseline – 27% and 36%. The protein predictions yielded the best results across all clusters, 53%–77% of the values fall in the tolerance-level range. These results were obtained using short descriptions, the embeddings can be improved if they are learned on longer descriptions, which would lead to better prediction results. Since the task of calculating macronutrients requires exact quantities of ingredients, these results obtained only from short description are a huge leap forward.

中文翻译：

P-NUT：从简短文本说明预测NUTrient内容

营养成分的评估对患有各种疾病的患者，职业运动员非常重要，并且出于健康原因，许多人已将其视为日常生活的一部分。但是，这是一项非常具有挑战性的任务，因为它需要完整而可靠的资源。我们介绍了一种机器学习管道，用于使用从食品的简短文字描述中学习到的矢量表示形式来预测食品的大量营养素值。在卫生专家使用的数据集上，该数据集包含对食物和大量营养素值的简短描述：我们生成段落嵌入，使用基于图形的矢量表示（包括食物域知识信息）在食物组中引入聚类，并为每个聚类训练回归模型。预测是针对四种常量营养素：碳水化合物，脂肪，蛋白质和水。碳水化合物预测的最高准确性为86％，而基线为27％和36％。蛋白质预测结果在所有聚类中均能获得最佳结果，其中53％–77％的值落在耐受水平范围内。这些结果是使用简短描述获得的，如果在较长的描述中学习到嵌入内容，则可以改善嵌入效果，从而获得更好的预测结果。由于计算大量营养素的任务需要精确数量的成分，因此仅通过简短描述即可获得这些结果，这是一个巨大的飞跃。如果在更长的描述中学习到嵌入，则可以改进嵌入，这将导致更好的预测结果。由于计算大量营养素的任务需要精确数量的成分，因此仅通过简短描述即可获得这些结果，这是一个巨大的飞跃。如果在更长的描述中学习到嵌入，则可以改进嵌入，这将导致更好的预测结果。由于计算大量营养素的任务需要精确数量的成分，因此仅通过简短描述即可获得这些结果，这是一个巨大的飞跃。

更新日期：2020-10-17

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文