Diversifying cheminformatics,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Diversifying cheminformatics
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2022-04-25 , DOI: 10.1186/s13321-022-00597-5
Barbara Zdrazil ₁ , Rajarshi Guha ₂

Affiliation

With Dr. Barbara Zdrazil starting in her role as Co-Editor-in-Chief in January 2022, we revisit the scope of J. Cheminform, as well as the role of cheminformatics as a discipline. We present our vision for the Journal in moving the field of cheminformatics as well as Open Science forward in the coming years.

By joining the fields of chemistry and information technology for solving chemical problems related to storage, indexing, and searching of chemical information, a tool was created to solve complex problems in drug discovery [1]. While chem- and bioinformatics are key to drug discovery workflows, cheminformatics is an important tool in disciplines beyond drug discovery, such as materials science [2], metabolomics [3], and odor research [4]. We believe that J. Cheminform. plays a key role within the cheminformatics community as a platform to disseminate descriptions (and implementations) of cheminformatics methods. This belief drives our focus on research reproducibility, requiring data and code being made openly available to the public, building an important foundation in order to accelerate basic research.

Therefore, over the next 4 years we will focus our editorial efforts on these three main topics:

Improving research reproducibility, open access data and code;
Publishing benchmark studies for machine learning and artificial intelligence-based studies to better understand the utility of different algorithms;
Expanding our support of diversity in cheminformatics: both from a topical aspect, where we highlight work in interdisciplinary and niche areas; and from a community aspect, where we increase the visibility of underrepresented groups and regions.

As we continue to follow previously defined publishing practices of the journal regarding re-usable and fully accessible content of the journal articles (including published software, data, and algorithms), we understand that there will be further efforts needed to better define reproducibility in a cheminformatics and computational chemistry setting. As R. Clark recently pointed out, there is no simple way of validating your algorithm, since it will always give the same results when applied to the same data set and under the same conditions (unlike in an experimental setting) [5]. While a rigorous re-implementation of algorithms as suggested by Clark is out of reviewers’ capacities, we are now starting an effort to engage in more active code reviews during the paper review processes in addition to enquiring on the availability of source code and data for reproducing the results of the paper.

The other aspect of improving research reproducibility is the encouragement of standardized formats for data submissions. One such effort that we intend to adopt and expand is based on work from Schymanski and Bolton [6], which will encourage authors to submit their chemical data via a chemical structure template and thereby link the DOI of the data file to the article DOI metadata. While we do not currently mandate such a submission format, as Editors, we will encourage authors and work with them to apply this template where possible.

The last decade has seen a tremendous rise in the use of machine learning (ML) algorithms across natural sciences. A bibliometric analysis highlights how the number of yearly published papers in the domain of ML has increased over time (Fig. 1). With the rise in new methods and applications, we feel that it is a necessary and timely undertaking to critically revise the numerous algorithms and assemble information about strengths and limitations of the various methods. One way to do so is to invite submissions that report on benchmarking studies. Recently, there has been some discussion around defining standards to enable rigorous comparisons of this kind [7, 8] , which of course also includes the discussion about the appropriate statistical tests for the use cases at hand. As we believe those are important discussions to have community-wide in order to bring our field forward and make it fit for the next generation of ML data scientists, we will foster initiatives in these directions in the near future. This will include publishing thematic issues on the topics of benchmarking studies for ML, statistical validation, etc.

Our third focus, on diversity, aims to increase the breadth of topics we publish on and the breadth of authors that may be working in this area.

J. Cheminform. is not just about publishing manuscripts, but also serves as a means to coordinate and encourage a multitude of research topics and researchers. The journal’s success opens up a great opportunity for us to aid the transition of the current research culture into a more divers and open-minded environment. While broadening the scope of cheminformatics papers will be undertaken by dedicated thematic collections, which will also include niche topics, we hope to address the latter by broadening representation on our editorial board, developing thematic issues that focus on careers in this field, and in particular, highlighting challenges and influences that scientists from different underrepresented groups are facing.

Our aim for the next four years is to bring open science efforts together with strategic plans to broaden the scope of the journal to be more diverse and inclusive while fostering initiatives to provide a platform for timely discussions around artificial intelligence-based algorithms and studies. The authors, reviewers and Editorial Board of J. Cheminform. have made great contributions to increase the quality of the articles published by the journal in the past, and we would like to acknowledge this community effort. Diversifying cheminformatics will only be possible with continued contributions by our community, in terms of submitting articles as well as active engagement in discussing the future role of cheminformatics.

Wishart DS (2007) Introduction to cheminformatics. Curr Protoc Bioinformatics Chapter 14, Unit 14.1. [cito:citesForInformation] [cito:citesAsSourceDocument]
Google Scholar
Belle CE, Aksakalli V, Russo SP (2021) A machine learning platform for the discovery of materials. J Cheminform 13:42 . [cito:citesAsEvidence] [cito:citesAsRecommendedReading]
CAS Article Google Scholar
Yu M, Dolios G, Petrick L (2022) Reproducible untargeted metabolomics workflow for exhaustive MS2 data acquisition of MS1 features. J Cheminform 14:6 [cito:citesAsEvidence] [cito:citesAsRecommendedReading]
CAS Article Google Scholar
Clery RA et al (2022) Chemical diversity of citrus leaf essential oils. Chem Biodivers. https://doi.org/10.1002/cbdv.202100963. [cito:citesAsEvidence] [cito:citesAsRecommendedReading]
Article PubMed Google Scholar
Clark RD (2019) A path to next-generation reproducibility in cheminformatics. J Cheminform 11:62. [cito:containsAssertionFrom]
Article Google Scholar
Schymanski EL, Bolton EE (2021) FAIR chemical structures in the Journal of Cheminformatics. J Cheminform 13:50. [cito:citesAsAuthority]
Article Google Scholar
Krstajic D (2019) Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J Cheminform 11:65. [cito:citesAsAuthority]
Article Google Scholar
Bosc N et al (2019) Reply to “Missed opportunities in large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery.” J Cheminform 11:64. [cito:citesAsAuthority]
Article Google Scholar

Download references

Affiliations

Department of Pharmaceutical Sciences, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
Barbara Zdrazil
Vertex Pharmaceuticals, 50 Northern Ave, Boston, MA, 02210, USA
Rajarshi Guha

Authors

Barbara ZdrazilView author publications
You can also search for this author in PubMed Google Scholar
Rajarshi GuhaView author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed to the development of the ideas presented and to writing of this editorial. Both authors read and approved the final manuscript.

Corresponding authors

Correspondence to Barbara Zdrazil or Rajarshi Guha.

Competing interests

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

Cite this article

Zdrazil, B., Guha, R. Diversifying cheminformatics. J Cheminform 14, 25 (2022). https://doi.org/10.1186/s13321-022-00597-5

Download citation

Published: 25 April 2022
DOI: https://doi.org/10.1186/s13321-022-00597-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

中文翻译：

多样化的化学信息学

随着 Barbara Zdrazil 博士于 2022 年 1 月开始担任联合主编，我们重新审视了J. Cheminform的范围，以及化学信息学作为一门学科的作用。我们展示了我们对期刊在未来几年推动化学信息学和开放科学领域的愿景。

通过加入化学和信息技术领域来解决与化学信息的存储、索引和搜索相关的化学问题，创建了一种工具来解决药物发现中的复杂问题 [1]。虽然化学和生物信息学是药物发现工作流程的关键，但化学信息学是药物发现以外学科的重要工具，例如材料科学 [2]、代谢组学 [3] 和气味研究 [4]。我们相信J. Cheminform。作为传播化学信息学方法描述（和实施）的平台，在化学信息学界发挥着关键作用。这种信念推动了我们对研究可重复性的关注，要求数据和代码向公众公开，为加速基础研究奠定重要基础。

因此，在接下来的 4 年中，我们的编辑工作将集中在以下三个主要主题上：

提高研究可重复性、开放获取数据和代码；
发布机器学习和基于人工智能的研究的基准研究，以更好地了解不同算法的效用；
扩大我们对化学信息学多样性的支持：从主题方面来看，我们强调跨学科和利基领域的工作；从社区的角度来看，我们提高了代表性不足的群体和地区的知名度。

随着我们继续遵循期刊先前定义的关于期刊文章的可重复使用和完全可访问的内容（包括已发表的软件、数据和算法）的出版实践，我们理解需要进一步努力以更好地定义可重复性化学信息学和计算化学设置。正如 R. Clark 最近指出的那样，没有简单的方法来验证您的算法，因为当应用于相同的数据集和相同的条件下（与实验设置不同）[5] 时，它总是会给出相同的结果。虽然克拉克建议的严格重新实现算法超出了审稿人的能力，

提高研究可重复性的另一个方面是鼓励数据提交的标准化格式。我们打算采用和扩展的其中一项工作是基于 Schymanski 和 Bolton [6] 的工作，这将鼓励作者通过化学结构模板提交他们的化学数据，从而将数据文件的 DOI 链接到文章 DOI 元数据. 虽然我们目前不强制要求这种提交格式，但作为编辑，我们将鼓励作者并与他们合作尽可能应用此模板。

在过去的十年中，机器学习 (ML) 算法在自然科学领域的使用有了巨大的增长。文献计量分析强调了机器学习领域每年发表的论文数量如何随着时间的推移而增加（图 1）。随着新方法和应用的兴起，我们认为对众多算法进行批判性修改并收集有关各种方法的优势和局限性的信息是一项必要且及时的工作。这样做的一种方法是邀请提交有关基准研究的报告。最近，围绕定义标准进行了一些讨论，以实现这种严格的比较 [7, 8]，其中当然还包括对手头用例的适当统计测试的讨论。由于我们相信这些是在整个社区范围内进行的重要讨论，以推动我们的领域向前发展并使其适合下一代 ML 数据科学家，因此我们将在不久的将来推动这些方向的举措。这将包括发布有关 ML 基准研究、统计验证等主题的主题问题。

我们的第三个重点是多样性，旨在增加我们发表的主题的广度以及可能在该领域工作的作者的广度。

J.Cheminform。不仅仅是出版手稿，还可以作为协调和鼓励众多研究主题和研究人员的一种手段。该杂志的成功为我们提供了一个很好的机会，可以帮助我们将当前的研究文化转变为更加多元化和开放的环境。虽然扩大化学信息学论文的范围将通过专门的主题收藏进行，其中也将包括小众主题，我们希望通过扩大我们编委的代表性来解决后者，开发专注于该领域职业的主题问题，特别是，强调来自不同代表性不足群体的科学家面临的挑战和影响。

我们未来四年的目标是将开放科学工作与战略计划结合起来，以扩大期刊的范围，使其更加多样化和包容，同时促进为及时讨论基于人工智能的算法和研究提供平台。J. Cheminform的作者、审稿人和编辑委员会。为提高期刊过去发表的文章质量做出了巨大贡献，我们要感谢社区的努力。化学信息学的多样化只有通过我们社区的持续贡献才能实现，包括提交文章以及积极参与讨论化学信息学的未来作用。

Wishart DS (2007) 化学信息学导论。Curr Protoc 生物信息学第 14 章，第 14.1 单元。[cito:citesForInformation] [cito:citesAsSourceDocument]
谷歌学术
Belle CE, Aksakalli V, Russo SP (2021) 用于发现材料的机器学习平台。J Cheminform 13:42。[cito:citesAsEvidence] [cito:citesAsRecommendedReading]
CAS 文章谷歌学术
Yu M, Dolios G, Petrick L (2022) 可重现的非靶向代谢组学工作流程，用于对 MS1 特征进行详尽的 MS2 数据采集。J Cheminform 14:6 [cito:citesAsEvidence] [cito:citesAsRecommendedReading]
CAS 文章谷歌学术
Clery RA 等人 (2022) 柑橘叶精油的化学多样性。化学生物多样性。https://doi.org/10.1002/cbdv.202100963。[cito:citesAsEvidence] [cito:citesAsRecommendedReading]
文章 PubMed 谷歌学术
Clark RD (2019) 通往下一代化学信息学重现性的道路。J Cheminform 11:62。[cito:containsAssertionFrom]
文章谷歌学术
Schymanski EL, Bolton EE (2021) 化学信息学杂志上的公平化学结构。J Cheminform 13:50。[cito:citesAsAuthority]
文章谷歌学术
Krstajic D (2019) 错过了大规模比较 QSAR 和保形预测方法及其在药物发现中的应用的机会。J Cheminform 11:65。[cito:citesAsAuthority]
文章谷歌学术
Bosc N et al (2019) 回复“QSAR 和保形预测方法的大规模比较中错过的机会及其在药物发现中的应用”。J Cheminform 11:64。[cito:citesAsAuthority]
文章谷歌学术

下载参考资料

隶属关系

维也纳大学药学系，Althanstraße 14, 1090，维也纳，奥地利
芭芭拉·兹德拉齐尔
Vertex Pharmaceuticals, 50 Northern Ave, Boston, MA, 02210, USA
拉贾什古哈

作者

Barbara Zdrazil查看作者的出版物
您也可以在PubMed Google Scholar中搜索此作者
Rajarshi Guha查看作者的出版物
您也可以在PubMed Google Scholar中搜索此作者

贡献

两位作者都为提出的观点和这篇社论的写作做出了贡献。两位作者都阅读并批准了最终手稿。

通讯作者

与 Barbara Zdrazil 或 Rajarshi Guha 的通信。

利益争夺

作者声明他们没有相互竞争的利益。

出版商注

Springer Nature 对出版地图和机构附属机构的管辖权主张保持中立。

开放存取本文根据知识共享署名 4.0 国际许可进行许可，该许可允许以任何媒介或格式使用、共享、改编、分发和复制，只要您对原作者和来源给予适当的信任，并提供链接到知识共享许可，并说明是否进行了更改。本文中的图像或其他第三方材料包含在文章的知识共享许可中，除非在材料的信用额度中另有说明。如果文章的知识共享许可中未包含材料，并且您的预期用途不受法律法规的允许或超出允许的用途，则您需要直接从版权所有者那里获得许可。要查看此许可证的副本，请访问 http://creativecommons.org/licenses/by/4.0/。

转载和许可