当前位置: X-MOL 学术Drug. Discov. Today › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Redundancy in two major compound databases
Drug Discovery Today ( IF 6.5 ) Pub Date : 2018-03-17 , DOI: 10.1016/j.drudis.2018.03.005
Dimitar Yonchev , Dilyana Dimova , Dagmar Stumpfe , Martin Vogt , Jürgen Bajorath

Public repositories of compounds and activity data are of prime importance for pharmaceutical research in academic and industrial settings. Major databases have evolved over the years. Their growth is accompanied by an increasing tendency toward data sharing. This is a positive development but not without potential problems. Using ChEMBL and PubChem as examples, we show that crosstalk between databases also leads to substantial data redundancy that might not be obvious. Redundancy is an important issue because it biases data analysis and knowledge extraction and leads to inflated views of available compounds, assays and activity data. Going forward it will be important to further refine data exchange and deposition criteria and make redundancy as transparent as possible.



中文翻译:

两个主要化合物数据库中的冗余

化合物和活性数据的公共存储库对于学术和工业环境中的药物研究至关重要。这些年来,主要的数据库都在发展。它们的增长伴随着数据共享的增长趋势。这是一个积极的发展,但并非没有潜在的问题。以ChEMBL和PubChem为例,我们表明数据库之间的串扰还会导致大量的数据冗余,而这可能并不明显。冗余是一个重要的问题,因为它会偏重数据分析和知识提取,并导致可用化合物,测定法和活性数据的观点膨胀。展望未来,重要的是进一步完善数据交换和沉积标准,并使冗余尽可能透明。

更新日期:2018-03-17
down
wechat
bug