当前位置: X-MOL 学术Artif. Intell. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A framework to extract biomedical knowledge from gluten-related tweets: The case of dietary concerns in digital era
Artificial Intelligence in Medicine ( IF 6.1 ) Pub Date : 2021-06-25 , DOI: 10.1016/j.artmed.2021.102131
Martín Pérez-Pérez 1 , Gilberto Igrejas 2 , Florentino Fdez-Riverola 1 , Anália Lourenço 3
Affiliation  

Big data importance and potential are becoming more and more relevant nowadays, enhanced by the explosive growth of information volume that is being generated on the Internet in the last years. In this sense, many experts agree that social media networks are one of the internet areas with higher growth in recent years and one of the fields that are expected to have a more significant increment in the coming years. Similarly, social media sites are quickly becoming one of the most popular platforms to discuss health issues and exchange social support with others. In this context, this work presents a new methodology to process, classify, visualise and analyse the big data knowledge produced by the sociome on social media platforms. This work proposes a methodology that combines natural language processing techniques, ontology-based named entity recognition methods, machine learning algorithms and graph mining techniques to: (i) reduce the irrelevant messages by identifying and focusing the analysis only on individuals and patient experiences from the public discussion; (ii) reduce the lexical noise produced by the different ways in how users express themselves through the use of domain ontologies; (iii) infer the demographic data of the individuals through the combined analysis of textual, geographical and visual profile information; (iv) perform a community detection and evaluate the health topic study combining the semantic processing of the public discourse with knowledge graph representation techniques; and (v) gain information about the shared resources combining the social media statistics with the semantical analysis of the web contents. The practical relevance of the proposed methodology has been proven in the study of 1.1 million unique messages from >400,000 distinct users related to one of the most popular dietary fads that evolve into a multibillion-dollar industry, i.e., gluten-free food. Besides, this work analysed one of the least research fields studied on Twitter concerning public health (i.e., the allergies or immunology diseases as celiac disease), discovering a wide range of health-related conclusions.



中文翻译:

从与麸质相关的推文中提取生物医学知识的框架:数字时代饮食问题的案例

大数据的重要性和潜力如今变得越来越重要,过去几年互联网上产生的信息量呈爆炸式增长,这进一步增强了大数据的重要性和潜力。从这个意义上说,许多专家都认为社交媒体网络是近年来增长较快的互联网领域之一,也是未来几年有望实现更显着增长的领域之一。同样,社交媒体网站正迅速成为讨论健康问题和与他人交流社会支持的最受欢迎的平台之一。在这种背景下,这项工作提出了一种新的方法来处理、分类、可视化和分析由社会产生的大数据知识。在社交媒体平台上。这项工作提出了一种将自然语言处理技术、基于本体的命名实体识别方法、机器学习算法和图挖掘技术相结合的方法,以:(i)通过识别和集中分析仅针对个人和患者体验来减少不相关的信息。公开讨论;( ii ) 减少用户通过使用领域本体来表达自己的不同方式所产生的词汇噪音;( iii ) 通过文本、地理和视觉资料信息的组合分析来推断个人的人口统计数据;() 将公共话语的语义处理与知识图谱表示技术相结合,进行社区检测并评估健康主题研究;( v ) 将社交媒体统计数据与网络内容的语义分析相结合,获取有关共享资源的信息。所提出的方法的实际相关性已在对来自 >400,000 名不同用户的 110 万条独特信息的研究中得到证明,这些信息与最流行的饮食时尚之一有关,这些时尚正在演变成一个价值数十亿美元的行业,即无麸质食品。此外,这项工作还分析了 Twitter 上研究最少的公共健康研究领域之一(即过敏症或免疫学疾病如乳糜泻),发现了广泛的与健康相关的结论。

更新日期:2021-07-02
down
wechat
bug