当前位置: X-MOL 学术IEEE J. Biomed. Health Inform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Knowledge Graph-Enabled Cancer Data Analytics.
IEEE Journal of Biomedical and Health Informatics ( IF 6.7 ) Pub Date : 2020-05-04 , DOI: 10.1109/jbhi.2020.2990797
S M Shamimul Hasan , Donna Rivera , Xiao-Cheng Wu , Eric B Durbin , J Blair Christian , Georgia Tourassi

Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding cancer characteristics, treatments, and outcomes. Cancer registry data typically (1) categorize each reportable cancer case or tumor at the time of diagnosis, (2) contain demographic information about the patient such as age, gender, and location at time of diagnosis, (3) include planned and completed primary treatment information, and (4) may contain survival outcomes. As structured data is being extracted from various unstructured sources, such as pathology reports, radiology reports, medical records, and stored for reporting and other needs, the associated information representing a reportable cancer is constantly expanding and evolving. While some popular analytic approaches including SEER*Stat and SAS exist, we provide a knowledge graph approach to organizing cancer registry data. Our approach offers unique advantages for timely data analysis and presentation and visualization of valuable information. This knowledge graph approach semantically enriches the data, and easily enables linking with third-party data which can help explain variation in cancer incidence patterns, disparities, and outcomes. We developed a prototype knowledge graph based on the Louisiana Tumor Registry dataset. We present the advantages of the knowledge graph approach by examining: i) scenario-specific queries, ii) links with openly available external datasets, iii) schema evolution for iterative analysis, and iv) data visualization. Our results demonstrate that this graph based solution can perform complex queries, improve query run-time performance by up to 76%, and more easily conduct iterative analyses to enhance researchers’ understanding of cancer registry data.

中文翻译:

支持知识图谱的癌症数据分析。

癌症登记处收集非结构化和结构化癌症数据用于监测目的,这些数据提供有关癌症特征、治疗和结果的重要见解。癌症登记数据通常 (1) 在诊断时对每个可报告的癌症病例或肿瘤进行分类,(2) 包含有关患者的人口统计信息,例如诊断时的年龄、性别和位置,(3) 包括计划和完成的初步检查治疗信息,(4) 可能包含生存结果。随着结构化数据从各种非结构化来源(例如病理报告、放射学报告、医疗记录)中提取并存储以用于报告和其他需求,表示可报告癌症的相关信息正在不断扩展和发展。虽然存在一些流行的分析方法,包括 SEER*Stat 和 SAS,但我们提供了一种知识图方法来组织癌症登记数据。我们的方法为及时数据分析以及有价值信息的呈现和可视化提供了独特的优势。这种知识图方法在语义上丰富了数据,并且可以轻松地与第三方数据链接,这有助于解释癌症发病模式、差异和结果的变化。我们基于路易斯安那州肿瘤登记数据集开发了一个原型知识图。我们通过检查以下内容来展示知识图方法的优势:i)特定于场景的查询,ii)与公开可用的外部数据集的链接,iii)用于迭代分析的模式演化,以及 iv)数据可视化。我们的结果表明,这种基于图的解决方案可以执行复杂的查询,将查询运行时性能提高高达 76%,并且更轻松地进行迭代分析,以增强研究人员对癌症登记数据的理解。
更新日期:2020-07-03
down
wechat
bug