当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Implementing cheminformatics.
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2019-02-05 , DOI: 10.1186/s13321-019-0333-z
Rajarshi Guha 1
Affiliation  

Computational characterization of chemical structures originated before the advent of digital computers [1]. However, the ability to represent and manipulate large collections of molecules and their associated information was enabled by the rise of cheminformatics algorithms and their implementions on digital computers. Willett [2] has suggested the work of Ray and Kirsch [3] on substructure searching as the first description of a computer implementation (on punched cards) of a cheminformatics algorithm.

Programming language research blossomed during the 1950’s and 60’s and saw the development of high level programming languages (such as FORTRAN [4], LISP [5] and ALGOL [6]). Cheminformatics research took advantage of these efforts, to move beyond punched cards. One of the earliest cheminformatics applications in a high level language was DENDRAL [7], written in LISP in 1963 [8]

Since the 1960’s, a plethora of languages have come into existence. Each language has its distinct features (directly memory manipulation in C, code as data in LISP [9], automated memory management in Java, lazy evaluation [10] in Haskell), but useful features from one language tend to show up in others (e.g., automated memory management initially appeared in LISP, but is now found in Java, Ruby, Python, C# and others). Furthermore, all modern languages are Turing equivalent [11] (i.e., capable of performing any arbitrary computation). One might then ask, what does it matter what language one uses to implement cheminformatics?

A number of factors go into deciding what language to use in a given setting. These include the suitability for a specific task (web development versus statistical modeling), prior knowledge of the language, the availability of supporting tools & frameworks and their licensing requirements and of course, performance.

A key consideration is the availability of external libraries such as cheminformatics toolkits (e.g., CDK [12] or JChem for Java applications). Many libraries (especially those written in C or C++) can be wrapped and made accessible to other languages (e.g., OpenBabel [13], RDKit and OEChem which are written in C++ provide SWIG wrappers enabling their use in Python and Java). Finally, for many projects, the choice of language is dictated by historical development (such as the use of Fortran for much of scientific computing).

At a more fundamental level, there are different programming models, which require conceptually different approaches to designing an application. For example, Khomtchouk et al. [14] suggest that the functional paradigm is best suited for scientific software development. On the other hand, Ray et al. [15] show that projects using functional languages do not necessarily show better software quality. One must consider others aspects, ranging from performance issues to the availability of programmers with sufficient skills to develop and then maintain applications written in functional languages. It is useful to note that some languages such as Scala are a hybrid, supporting both functional and procedural paradigms.

In this thematic series we have invited authors to present their views on a variety of programming languages. The series is rolling, and starts of with contributions from Thiesen [16], Berenger [17], and Höck [18] discussing JavaScript, OCaml and Scala respectively. We anticipate contributions covering Scala, C/C++, Tcl and noSQL.

The intended audience for this series are practitioners of cheminformatics who are already familiar with one programming language and would like to learn what other languages may offer in terms of language features and supported tooling.

We do not intend this to be a head to head comparison. Rather, the contributions are structured to address one or more of the following aspects

  • How that language (or model of programming) affects scientific software development

  • How a language may enable the development of new approaches to solving a problem in cheminformatics or computational chemistry

  • Specific approaches to overcome language limitations when dealing with chemical of biological data types

  • Comments on performance and it’s relevance to the languages goals

  • Educational aspects of the language (is it easier for newcomers?)

  • Development environments and frameworks that make a language easier to use and deploy (e.g., RStudio for R and Jupyter notebooks for Python)

The goal of this issue is to highlight features of different languages that the authors have employed to build applications as well as their views on the benefits (and downsides) of the language that has driven them to invest effort in building capabilities in their chosen language. We do not expect that this will identify any single language as the “chosen one”. Rather, we hope that the articles in this issue will be a useful guide for the community to assess which languages may be appropriate for their next project.

  1. 1.

    Wiener H (1947) Structural determination of paraffin boiling points. J Am Chem Soc 69(11):2636–2638

    CAS Article Google Scholar

  2. 2.

    Willett P (2011) Chemoinformatics: a history. WIREs Comput Mol Sci 1(1):46–56

    CAS Article Google Scholar

  3. 3.

    Ray LC, Kirsch RA (1957) Finding chemical records by digital computers. Science 126:814–819

    CAS Article Google Scholar

  4. 4.

    McJones P (2018) History of FORTRAN and FORTRAN II (2018). http://www.softwarepreservation.org/projects/FORTRAN Accessed Nov 2018

  5. 5.

    Stoyan H (1984) Early lisp history (1956–1959). In: Proceedings of the 1984 ACM symposium on LISP and functional programming. LFP ’84, pp 299–310. ACM, New York. https://doi.org/10.1145/800055.802047

  6. 6.

    McJones P (2018) History of ALGOL (2018). http://www.softwarepreservation.org/projects/ALGOL/. Accessed Nov 2018

  7. 7.

    Lindsay RK, Buchanan BG, Feigenbaum EA, Lederberg JA (1993) DENDRAL: a case study of the first expert system for scientific hypothesis formation. Artif Intell 61(2):209–261

    Article Google Scholar

  8. 8.

    Sutherland G (1963) Letter from Georgia Sutherland to R. Shirley. https://exhibits.stanford.edu/feigenbaum/catalog/qc171fk5406

  9. 9.

    McIlroy D (1960) Macro instruction extensions of compiler languages. Commun ACM 3(4):214–220

    Article Google Scholar

  10. 10.

    Watt DA, Findlay W (2004) Programming language design concepts. Wiley, Hoboken

    Google Scholar

  11. 11.

    Brainerd WS, Landweber LH (1974) Theory of computation. Wiley, Hoboken

    Google Scholar

  12. 12.

    Willighagen EL, May JW, Alvarsson J, Berg A, Carlsson L, Duhrkop K, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Cherto M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C (2017) The chemistry development kit (cdk): atom typing, rendering, molecular formulas, and substructure searching. J Cheminform 9:33

    Article Google Scholar

  13. 13.

    O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33

    CAS Article PubMed PubMed Central Google Scholar

  14. 14.

    Khomtchouk BB, Weitz E, Karp PD, Wahlestedt C (2018) How the strengths of lisp-family languages facilitate building complex and flexible bioinformatics applications. Brief Bioinform 19(3):537–543. https://doi.org/10.1093/bib/bbw130

    Article PubMed Google Scholar

  15. 15.

    Ray B, Posnett D, Devanbu P, Filkov V (2017) A large-scale study of programming languages and code quality in github. Commun ACM 60(10):91–100

    Article Google Scholar

  16. 16.

    Theisen KJ (2019) Programming languages in chemistry: a review of HTML5/JavaScript. J Cheminform. https://doi.org/10.1186/s13321-019-0331-1

    Article Google Scholar

  17. 17.

    Berenger F, Zhang KYJ, Yamanishi Y (2019) Chemoinformatics and structural bioinformatics in OCaml. J Cheminform. https://doi.org/10.1186/s13321-019-0332-0

    Article Google Scholar

  18. 18.

    Höck S, Riedl R (2012) chemf: a purely functional chemistry toolkit. J Cheminform 4(1):38. https://doi.org/10.1186/1758-2946-4-38

    CAS Article PubMed PubMed Central Google Scholar

Download references

RG conceived and designed the thematic issue and wrote this manuscript. The author read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Affiliations

  1. Vertex Pharmaceuticals, 50 Northern Ave, Boston, MA, 02210, USA

    Rajarshi Guha

Authors
  1. Rajarshi GuhaView author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajarshi Guha.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

Verify currency and authenticity via CrossMark

Cite this article

Guha, R. Implementing cheminformatics. J Cheminform 11, 12 (2019). https://doi.org/10.1186/s13321-019-0333-z

Download citation



中文翻译:

实施化学信息学。

化学结构的计算表征起源于数字计算机的出现[ 1 ]。但是,化学信息学算法的兴起及其在数字计算机上的实现,使人们能够表达和操纵大量分子及其相关信息。Willett [ 2 ]建议Ray和Kirsch [ 3 ]在子结构搜索上的工作,作为化学信息学算法的计算机实现(在打孔卡上)的第一个描述。

编程语言研究在1950年代和60年代蓬勃发展,并且看到了高级编程语言(例如FORTRAN [ 4 ],LISP [ 5 ]和ALGOL [ 6 ])的发展。化学信息学研究利用了这些努力,超越了打孔卡。DENDRAL [ 7 ]是最早使用高级语言的化学信息学应用程序,1963年用LISP编写[ 8 ]。

自1960年代以来,已经出现了多种语言。每种语言都有其独特的功能(在C中直接进行内存操作,在LISP中作为数据进行代码编码[ 9 ],在Java中进行自动内存管理,在Haskell中进行惰性评估[ 10 ]),但是一种语言的有用功能往往会在其他语言中显示(例如,自动内存管理最初出现在LISP中,但现在可以在Java,Ruby,Python,C#等中找到)。此外,所有现代语言都是图灵等效语言[ 11 ](即,能够执行任何任意计算)。然后人们可能会问,人们使用哪种语言实现化学信息学有什么关系?

决定给定环境中使用哪种语言的因素很多。其中包括对特定任务的适用性(Web开发与统计建模),语言的先验知识,支持工具和框架的可用性以及其许可要求以及性能。

一个关键的考虑因素是诸如化学信息学工具包(例如CDK [ 12 ]或Java的JChem)之类的外部库的可用性。许多库(尤其是用C或C ++编写的库)都可以包装并可以被其他语言访问(例如,用C ++编写的OpenBabel [ 13 ],RDKit和OEChem提供了SWIG包装器,使其可以在Python和Java中使用)。最后,对于许多项目,语言的选择取决于历史的发展(例如,将Fortran用于许多科学计算)。

从根本上讲,存在不同的编程模型,这些模型在概念上需要不同的方法来设计应用程序。例如,Khomtchouk等。[ 14 ]建议功能范式最适合科学软件开发。另一方面,雷等。[ 15 ]显示使用功能语言的项目不一定显示更好的软件质量。人们必须考虑其他方面,从性能问题到具有足够技能来开发和维护以功能语言编写的应用程序的程序员的可用性。值得注意的是,某些语言(例如Scala)是混合语言,支持功能和过程范例。

在本专题系列中,我们邀请了作者就各种编程语言发表他们的看法。该系列正在进行中,并从Thiesen [ 16 ],Berenger [ 17 ]和Höck[ 18 ]的贡献开始,分别讨论了JavaScript,OCaml和Scala。我们预计会涉及Scala,C / C ++,Tcl和noSQL。

本系列的目标读者是已经熟悉一种编程语言并且想了解其他语言在语言功能和支持的工具方面可能提供什么的化学信息学的从业者。

我们不希望这是一个正面的比较。而是,这些捐款旨在解决以下一个或多个方面

  • 该语言(或编程模型)如何影响科学软件开发

  • 语言如何使开发解决化学信息学或计算化学问题的新方法成为可能

  • 处理生物数据类型的化学药品时克服语言限制的特定方法

  • 关于性能的评论及其与语言目标的关系

  • 语言的教育方面(新手会更容易吗?)

  • 使语言易于使用和部署的开发环境和框架(例如,用于R的RStudio和用于Python的Jupyter笔记本)

本期杂志的目的是强调作者用来构建应用程序的不同语言的功能,以及他们对该语言的优点(和缺点)的看法,这些观点促使他们投入更多精力来构建所选语言的功能。我们不希望这会将任何一种语言标识为“选择的语言”。相反,我们希望本期文章对社区评估对下一个项目合适的语言有所帮助。

  1. 1。

    Wiener H(1947)石蜡沸点的结构测定。J Am Chem Soc 69(11):2636–2638

    CAS 文章 Google学术搜索

  2. 2。

    Willett P(2011)化学信息学:历史。电线Compute Mol Sci 1(1):46–56

    CAS 文章 Google学术搜索

  3. 3。

    Ray LC,Kirsch RA(1957)通过数字计算机查找化学记录。科学126:814–819

    CAS 文章 Google学术搜索

  4. 4。

    McJones P(2018)FORTRAN和FORTRAN II的历史(2018)。http://www.softwarepreservation.org/projects/FORTRAN访问2018年11月

  5. 5,

    斯托扬·H(Stoyan H(1984))早期口唇病史(1956-1959)。于:1984年ACM关于LISP和函数式编程的研讨会论文集。LFP '84,第299-310页。纽约ACM。https://doi.org/10.1145/800055.802047

  6. 6。

    McJones P(2018)ALGOL的历史(2018)。http://www.softwarepreservation.org/projects/ALGOL/。于2018年11月访问

  7. 7。

    Lindsay RK,Buchanan BG,Feigenbaum EA,Lederberg JA(1993)DENDRAL:第一个用于科学假设形成的专家系统的案例研究。Artif Intell 61(2):209–261

    文章 Google学术搜索

  8. 8。

    Sutherland G(1963)乔治亚·萨瑟兰(Georgia Sutherland)给R. Shirley的信。https://exhibits.stanford.edu/feigenbaum/catalog/qc171fk5406

  9. 9。

    McIlroy D(1960)编译器语言的宏指令扩展。社区ACM 3(4):214–220

    文章 Google学术搜索

  10. 10。

    Watt DA,Findlay W(2004)编程语言设计概念。霍博肯威利

    谷歌学术

  11. 11。

    Brainerd WS,Landweber LH(1974)计算理论。霍博肯威利

    谷歌学术

  12. 12

    Willighagen EL,五月JW,Alvarsson J,Berg A,Carlsson L,Duhrkop K,Jeliazkova N,Kuhn S,Pluskal T,Rojas-Cherto M,Spjuth O,Torrance G,Evelo CT,Guha R,Steinbeck C(2017)The化学开发工具包(cdk):原子类型,渲染,分子式和子结构搜索。化学文摘9:33

    文章 Google学术搜索

  13. 13

    O'Boyle NM,Banck M,James CA,Morley C,Vandermeersch T,Hutchison GR(2011)Open babel:开放式化学工具箱。化学文摘3:33。https://doi.org/10.1186/1758-2946-3-33

    CAS Article PubMed PubMed Central Google学术搜索

  14. 14。

    Khomtchouk BB,Weitz E,Karp PD,Wahlestedt C(2018)Lisp家庭语言的优势如何促进构建复杂而灵活的生物信息学应用程序。生物信息简报19(3):537–543。https://doi.org/10.1093/bib/bbw130

    文章 PubMed Google学术搜索

  15. 15

    Ray B,Posnett D,Devanbu P,Filkov V(2017)对github中编程语言和代码质量的大规模研究。社区ACM 60(10):91–100

    文章 Google学术搜索

  16. 16。

    Theisen KJ(2019)化学编程语言:HTML5 / JavaScript综述 J化学文摘。https://doi.org/10.1186/s13321-019-0331-1

    文章 Google学术搜索

  17. 17。

    Berenger F,Zhang KYJ,Yamanishi Y(2019)OCaml中的化学信息学和结构生物信息学。J化学文摘。https://doi.org/10.1186/s13321-019-0332-0

    文章 Google学术搜索

  18. 18岁

    HöckS,Riedl R(2012)chemf:一个纯粹的功能化学工具包。化学文摘4(1):38。https://doi.org/10.1186/1758-2946-4-38

    CAS Article PubMed PubMed Central Google学术搜索

下载参考

RG构思并设计了主题问题,并撰写了此手稿。作者阅读并批准了最终稿。

利益争夺

作者宣称他们没有竞争利益。

发行人须知

对于出版的地图和机构隶属关系中的管辖权主张,Springer Nature保持中立。

隶属关系

  1. Vertex Pharmaceuticals,50 Northern Ave,波士顿,MA,02210,美国

    拉贾西·古哈(Rajarshi Guha)

s
  1. Rajarshi Guha查看作者出版物

    您也可以在PubMed Google学术搜索中搜索该作者 

通讯作者

拉贾尔希古哈的往来

开放获取本文是根据知识共享署名4.0国际许可(http://creativecommons.org/licenses/by/4.0/)的条款分发的,该许可允许您以任何方式在任何介质中进行无限制的使用,分发和复制。适当的版权归原始作者和来源,提供指向知识共享许可的链接,并指出是否进行了更改。除非另有说明,否则,知识共享公共领域专用豁免(http://creativecommons.org/publicdomain/zero/1.0/)适用于本文提供的数据。

转载和许可

通过CrossMark验证货币和真实性

引用本文

Guha,R.实现化学信息学。化学学报 11,12(2019)。https://doi.org/10.1186/s13321-019-0333-z

下载引文

更新日期:2019-02-05
down
wechat
bug