当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Social Media Mining Toolkit (SMMT)
arXiv - CS - Information Retrieval Pub Date : 2020-03-31 , DOI: arxiv-2003.13894
Ramya Tekumalla and Juan M. Banda

There has been a dramatic increase in the popularity of utilizing social media data for research purposes within the biomedical community. In PubMed alone, there have been nearly 2,500 publication entries since 2014 that deal with analyzing social media data from Twitter and Reddit. However, the vast majority of those works do not share their code or data for replicating their studies. With minimal exceptions, the few that do, place the burden on the researcher to figure out how to fetch the data, how to best format their data, and how to create automatic and manual annotations on the acquired data. In order to address this pressing issue, we introduce the Social Media Mining Toolkit (SMMT), a suite of tools aimed to encapsulate the cumbersome details of acquiring, preprocessing, annotating and standardizing social media data. The purpose of our toolkit is for researchers to focus on answering research questions, and not the technical aspects of using social media data. By using a standard toolkit, researchers will be able to acquire, use, and release data in a consistent way that is transparent for everybody using the toolkit, hence, simplifying research reproducibility and accessibility in the social media domain.

中文翻译:

社交媒体挖掘工具包 (SMMT)

在生物医学界,将社交媒体数据用于研究目的的流行程度急剧增加。仅在 PubMed 中,自 2014 年以来就有近 2,500 个出版物条目涉及分析来自 Twitter 和 Reddit 的社交媒体数据。然而,这些作品中的绝大多数没有共享他们的代码或数据来复制他们的研究。除了极少的例外,少数这样做的人都会让研究人员承担起弄清楚如何获取数据、如何最好地格式化数据以及如何对获取的数据创建自动和手动注释的负担。为了解决这个紧迫的问题,我们引入了社交媒体挖掘工具包(SMMT),这是一套工具,旨在封装获取、预处理、注释和标准化社交媒体数据的繁琐细节。我们工具包的目的是让研究人员专注于回答研究问题,而不是使用社交媒体数据的技术方面。通过使用标准工具包,研究人员将能够以一致的方式获取、使用和发布数据,该方式对使用该工具包的每个人都是透明的,从而简化社交媒体领域的研究可重复性和可访问性。
更新日期:2020-07-16
down
wechat
bug