当前位置: X-MOL 学术Int. J. Artif. Intell. Tools › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Building High Performance Explainable Machine Learning Models for Social Media-based Substance Use Prediction
International Journal on Artificial Intelligence Tools ( IF 1.0 ) Pub Date : 2020-06-17 , DOI: 10.1142/s021821302060009x
Tao Ding 1 , Fatema Hasan 1 , Warren K. Bickel 2 , Shimei Pan 1
Affiliation  

Social media contain rich information that can be used to help understand human mind and behavior. Social media data, however, are mostly unstructured (e.g., text and image) and a large number of features may be needed to represent them (e.g., we may need millions of unigrams to represent social media texts). Moreover, accurately assessing human behavior is often difficult (e.g., assessing addiction may require medical diagnosis). As a result, the ground truth data needed to train a supervised human behavior model are often difficult to obtain at a large scale. To avoid overfitting, many state-of-the-art behavior models employ sophisticated unsupervised or self-supervised machine learning methods to leverage a large amount of unsupervised data for both feature learning and dimension reduction. Unfortunately, despite their high performance, these advanced machine learning models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important to behavior scientists and public health providers, we explore new methods to build machine learning models that are not only accurate but also interpretable. We evaluate the effectiveness of the proposed methods in predicting Substance Use Disorders (SUD). We believe the methods we proposed are general and applicable to a wide range of data-driven human trait and behavior analysis applications.

中文翻译:

为基于社交媒体的物质使用预测构建高性能可解释机器学习模型

社交媒体包含丰富的信息,可用于帮助了解人类的思想和行为。然而,社交媒体数据大多是非结构化的(例如,文本和图像),可能需要大量特征来表示它们(例如,我们可能需要数百万个一元组来表示社交媒体文本)。此外,准确评估人类行为通常很困难(例如,评估成瘾可能需要医学诊断)。因此,训练有监督的人类行为模型所需的地面实况数据通常难以大规模获得。为了避免过度拟合,许多最先进的行为模型采用复杂的无监督或自监督机器学习方法来利用大量无监督数据进行特征学习和降维。不幸的是,尽管它们的性能很高,这些先进的机器学习模型通常依赖于难以解释的潜在特征。由于了解这些模型中捕获的知识对行为科学家和公共卫生提供者很重要,因此我们探索新方法来构建不仅准确而且可解释的机器学习模型。我们评估了所提出的方法在预测物质使用障碍 (SUD) 方面的有效性。我们相信我们提出的方法是通用的,适用于广泛的数据驱动的人类特征和行为分析应用程序。我们探索新方法来构建不仅准确而且可解释的机器学习模型。我们评估了所提出的方法在预测物质使用障碍 (SUD) 方面的有效性。我们相信我们提出的方法是通用的,适用于广泛的数据驱动的人类特征和行为分析应用程序。我们探索新方法来构建不仅准确而且可解释的机器学习模型。我们评估了所提出的方法在预测物质使用障碍 (SUD) 方面的有效性。我们相信我们提出的方法是通用的,适用于广泛的数据驱动的人类特征和行为分析应用程序。
更新日期:2020-06-17
down
wechat
bug