当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Visual Exploration and Knowledge Discovery from Biomedical Dark Data
arXiv - CS - Digital Libraries Pub Date : 2020-09-28 , DOI: arxiv-2009.13059
Shashwat Aggarwal, Ramesh Singh

Data visualization techniques proffer efficient means to organize and present data in graphically appealing formats, which not only speeds up the process of decision making and pattern recognition but also enables decision-makers to fully understand data insights and make informed decisions. Over time, with the rise in technological and computational resources, there has been an exponential increase in the world's scientific knowledge. However, most of it lacks structure and cannot be easily categorized and imported into regular databases. This type of data is often termed as Dark Data. Data visualization techniques provide a promising solution to explore such data by allowing quick comprehension of information, the discovery of emerging trends, identification of relationships and patterns, etc. In this empirical research study, we use the rich corpus of PubMed comprising of more than 30 million citations from biomedical literature to visually explore and understand the underlying key-insights using various information visualization techniques. We employ a natural language processing based pipeline to discover knowledge out of the biomedical dark data. The pipeline comprises of different lexical analysis techniques like Topic Modeling to extract inherent topics and major focus areas, Network Graphs to study the relationships between various entities like scientific documents and journals, researchers, and, keywords and terms, etc. With this analytical research, we aim to proffer a potential solution to overcome the problem of analyzing overwhelming amounts of information and diminish the limitation of human cognition and perception in handling and examining such large volumes of data.

中文翻译:

生物医学暗数据的视觉探索与知识发现

数据可视化技术提供了以图形吸引人的格式组织和呈现数据的有效方法,这不仅加快了决策和模式识别的过程,而且使决策者能够充分理解数据洞察并做出明智的决策。随着时间的推移,随着技术和计算资源的增加,世界科学知识呈指数级增长。然而,其中大部分缺乏结构,不能容易地归类并导入到常规数据库中。这种类型的数据通常被称为暗数据。数据可视化技术通过允许快速理解信息、发现新兴趋势、识别关系和模式等,为探索此类数据提供了一种有前景的解决方案。 在这项实证研究中,我们使用 PubMed 的丰富语料库,其中包含来自生物医学文献的超过 3000 万次引用,使用各种信息可视化技术直观地探索和理解潜在的关键见解。我们采用基于自然语言处理的管道从生物医学暗数据中发现知识。该管道包括不同的词法分析技术,例如用于提取固有主题和主要关注领域的主题建模,用于研究各种实体(如科学文档和期刊、研究人员以及关键字和术语等)之间关系的网络图。通过这项分析研究,
更新日期:2020-09-29
down
wechat
bug