Skip to main content
Log in

The QSAR similarity principle in the deep learning era: Confirmation or revision?

  • Published:
Foundations of Chemistry Aims and scope Submit manuscript

Abstract

Structure–activity relationship (SAR) and quantitative SAR (QSAR) are modeling methods largely used in assessing biological properties of chemical substances. QSAR is based on the hypothesis that the chemical structure is responsible for the activity; it follows that similar molecules are expected to have similar properties. Similarity plays an important role in read across, which categorizes molecules primarily on the basis of similarity. Similarity, and chemical similarity too, is a property differently perceived by humans. The various proposed metrics often disagree with human judgment, and no a unique metric for chemical similarity is universally adopted. Researchers argued that categorization is not only explained by similarity but depends as well on abstract knowledge and the task to accomplish. Moreover, similarity cannot be the unique explanation of a categorization, as different perceptual processes take place in category formation. Assuming that similarity judgments are deeply rooted in human knowledge and perception, cognitive sciences contributions are as important as the mathematical considerations of the classical theories. After an excursus in the many views of similarity in philosophy, mathematics, and cognitive science, the paper explores how connectionist systems, which loosely mimic the human cognitive system, could improve similarity-based choices. A case study on building (Q)SARs using connectionism and deep neural networks shows the role of similarity in building and explaining those models. A discussion about deep learning for QSAR and as a modeling tool for science concludes the presentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abrantes, P.: Analogical reasoning and modeling in the sciences. Found Sci 4, 237–270 (1999)

    Google Scholar 

  • Barbes, J. (ed.): The Complete Works of Aristotle Bollingen Series LXXI 2, 6th edn. Princeton University Press, Princeton (1995)

    Google Scholar 

  • Basak, S.C.: Philosophy of mathematical chemistry: a personal perspective. HYLE Int. J. Philos. Chem. 19(1), 3–17 (2013)

    Google Scholar 

  • Bechberger, L., Kuehnberger, K.-U.: Towards grounding conceptual spaces in neural representations. In: Proceedings of Twelveth International Workshop on Neural-Symbolic Learning and Reasoning, London, UK (2017)

  • Benfenati, E., et al.: Results of a round-robin exercise on read-across. SAR QSAR Environ. Res. 27(5), 371–384 (2016)

    Google Scholar 

  • Benfenati, E., et al.: A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity. SAR QSAR Environ. Res. 29(8), 591–611 (2018)

    Google Scholar 

  • Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. PAMI 35(8), 1798–1828 (2013)

    Google Scholar 

  • Benigni, R., Bossa, C.: Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. Mutat. Res. 659(3), 248–261 (2008)

    Google Scholar 

  • Bernal, A., Daza, E.: Metabolic networks: beyond the graph. Curr. Comput. Aid Drug 7(2), 122–132 (2011)

    Google Scholar 

  • Buckner, C., Garson, J.: Connectionism. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/fall2019/entries/connectionism/ (2019). Accessed 28 June 2020

  • Carnap, R.: The logical structure of the world. University of California Press , Berkeley (1928–1967)

  • Chakravarti, S.K., Saiakhov, R.D.: Computing similarity between structural environments of mutagenicity alerts. Mutagenesis 34(1), 55–65 (2019)

    Google Scholar 

  • Cichy, R.M., Kaiser, D.: Deep neural networks as scientific models. Trends Cognit. Sci. 23(4), 305–317 (2019)

    Google Scholar 

  • Cooper, J.M., Hutchinson, D.S. (eds.): Plato. Complete works. Hackett Publ. Co., Indianapolis (1997)

    Google Scholar 

  • Decock, L., Douven, I.: Similarity after goodman. Rev Philos Psychol 2, 61–75 (2011)

    Google Scholar 

  • Floris, M., Manganaro, A., Nicolotti, O., Medda, R., Mangiatordi, G.F., Benfenati, E.: A generalizable definition of chemical similarity for read-across. J. Cheminform. 6, 39 (2014)

    Google Scholar 

  • Fodor, J.A.: LOT 2: The Language of Thought Revisited. Oxford University Press, Oxford (2008)

    Google Scholar 

  • Frankel, L.: Leibniz’s Principle of Identity of Indiscernibles. Studia Leibnitiana Bd. 13, H. 2, pp 192–211 (1981)

  • Gärdenfors, P.: Conceptual spaces: the geometry of thought. MIT Press, Cambridge (2000)

    Google Scholar 

  • Giere, R.N.: Using models to represent reality. In: Magnani, L., Nersessian, N.J., Thagard, P. (eds.) Model-Based Reasoning in Scientific Discovery, pp. 41–57. Springer, Boston (1999)

    Google Scholar 

  • Gini, G.: QSAR methods. In: Benfenati, E. (ed.) In Silico Methods for Predicting Drug Toxicity, pp. 1–20. Springer, Clifton (2016)

    Google Scholar 

  • Gini, G.: QSAR, what else? In: Nicolotti, O. (ed.) Computational Toxicology: Methods and Protocols, vol. 1800, pp. 79–105. Springer, Clifton (2018)

    Google Scholar 

  • Gini, G., Katrizky, A. (eds.): Predictive Toxicology of Chemicals: Experiences and Impact of AI Tools. SS-99-01. AAAI Press, Menlo Park (1999)

    Google Scholar 

  • Gini, G., Zanoli, F.: Machine learning and deep learning methods in ecotoxicological QSAR modeling. In: Roy, K. (ed.) Ecotoxicological QSARs. Springer, Berlin (2020)

    Google Scholar 

  • Gini, G., Ferrari, T., Cattaneo, D., Bakhtyari, N.G., Manganaro, A., Benfenati, E.: Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ. Res. 24(5), 365–383 (2013)

    Google Scholar 

  • Gini, G., Franchi, A.M., Manganaro, A., Golbamaki, A., Benfenati, E.: ToxRead: a tool to assist in read across and its use to assess mutagenicity of chemicals. SAR QSAR Environ. Res. 25(12), 999–1011 (2014)

    Google Scholar 

  • Gini, G., Zanoli, F., Gamba, A., Raitano, G., Benfenati, E.: Could deep learning in neural networks improve the QSAR models? SAR QSAR Environ. Res. 30(9), 617–642 (2019)

    Google Scholar 

  • Goh, G., Siegel, C., Vishnu, A., Hodas, N. O., Baker, N.: Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. https://arxiv.org/abs/1706.066892017 (2017)

  • Goh, G., Hodas, N., Siegel, C., Vishnu, A.: SMILES2vec: an interpretable general-purpose deep neural network for predicting chemical properties. https://arxiv.org/abs/1712.02034v2 [stat.ML] (2018)

  • Goldstone, R.L., Son, J.Y.: Similarity. In: Holyoak, Morrison (ed.) The Cambridge Handbook of Thinking and Reasoning, pp. 13–36. Cambridge University Press, Cambridge (2005)

    Google Scholar 

  • Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, Boston (2016)

    Google Scholar 

  • Goodman, N.: Seven strictures on similarity. In: Goodman, N. (ed) Problems and Projects, pp. 437–446. Bobbs-Merrill, Indianapolis/New York (1972)

    Google Scholar 

  • Hahn, U., Chater, N.: Concepts and similarity. In: Lamberts, L., Shanks, D. (eds.) Knowledge, Concepts, and Categories. Psychology Press/MIT Press, Hove (1997)

    Google Scholar 

  • Hampton, J.A.: Typicality, graded membership, and vagueness. Cognit. Sci. 31, 355–384 (2007)

    Google Scholar 

  • Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)

    Google Scholar 

  • Jacob, E.K.: Classification and categorization: a difference that makes a difference. Univ. Ill. Libr. Trends 52(3), 515–540 (2004)

    Google Scholar 

  • Johnson, A.M., Maggiora, G.M.: Concepts and Applications of Molecular Similarity. Willey, New York (1990)

    Google Scholar 

  • Kirkpatrik, P., Ellis, C.: Chemical space. Nature 32(16), 823 (2004)

    Google Scholar 

  • Kitcher, P.: The Advancement of Science: Science Without Legend. Oxford University Press, Objectivity Without Illusions (1993)

    Google Scholar 

  • Kubinyi, H.: Chemical similarity and biological activities. J. Braz. Chem. Soc. 13(6), (2002)

  • LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Google Scholar 

  • Lipkus, A.H., Yuan, Q., Lucas, K.A., Funk, S.A., Bartelt III, W.F., Schenck, R.J., Trippe, A.J.: Structural diversity of organic chemistry. A scaffold analysis of the CAS registry. J. Org. Chem. 73, 4443–4451 (2008)

    Google Scholar 

  • Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of IJCAI-16, pp. 2873–2879 (2016)

  • Maggiora, G.M.: On outliers and activity cliffs—why QSAR often disappoints. J. Chem. Inf. Model. 46(4), 1535 (2006)

    Google Scholar 

  • Maggiora, G., Vogt, M., Stumpfe, D., Bajorat, J.: Similarity in medicinal chemistry. J. Med. Chem. 57(8), 3186–3204 (2014)

    Google Scholar 

  • Marquis, J. P.: Category theory. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/sum2020/entries/category-theory/ (2020). Accessed 28 June 2020

  • Martin, Y.C., Kofron, J.L., Traphagen, L.M.: Do structurally similar molecules have similar biological activity? J. Med. Chem. 45(19), 4350–4358 (2002)

    Google Scholar 

  • Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. 3, 80 (2016)

    Google Scholar 

  • Miller, G.A.: The cognitive revolution: a historical perspective. Trends Cognit. Sci. 7(3), 141–144 (2003)

    Google Scholar 

  • Nikolova, N., Jaworska, J.: Approaches to measure chemical similarity— a review. QSAR Comb. Sci. 22(9–10), 1006–1026 (2003)

    Google Scholar 

  • Olshausen, A.B., Field, D.J.: Sparse coding of sensory inputs. Curr. Opin. Neurobiol. 14, 481–487 (2004)

    Google Scholar 

  • Quine, W.V.: Natural kinds. In: Rescher, N. (ed.) Essays in Honor of Carl G. Hempel, pp. 5–23. D. Reidel, Dordrecht (1970)

    Google Scholar 

  • Restrepo, G., Harré, R.: Mereology of Quantitative Structure-Activity Relationships Models. HYLE Int. J. Philos. Chem. 21(1), 19–38 (2015)

    Google Scholar 

  • Rouvray, H. (ed.): Concepts in Chemistry: A Contemporary Challenge. Wiley, New York (1997)

    Google Scholar 

  • Shepard, R.N.: The analysis of proximities: multidimensional scaling with an unknown distance function. Part 1. Psychometrika 27, 125–140 (1962)

    Google Scholar 

  • Tanimoto, T.T.: IBM Internal Report. IBM Corporation, Armonk, NY, Nov 17, 1957

  • Todeschini, R., Consonni, V.: Molecular Descriptors for Chemoinformatics (2 Volumes). Wiley-VCH, Weinheim (2009)

    Google Scholar 

  • Torgerson, W.S.: Multidimensional scaling of similarity. Psychometrika 30, 379–393 (1965)

    Google Scholar 

  • Toropov, A.P., et al.: Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines. Chem. Intell. Lab. Syst. 109, 94–100 (2011)

    Google Scholar 

  • Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–354 (1977)

    Google Scholar 

  • Weininger, M., Weininger, A., Weininger, J.L.: SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Model. 29, 97–101 (1989)

    Google Scholar 

  • Wertheimer, M.: Investigations on gestalt principles. In: Spillmann, L. (ed) On Perceived Motion and Figural Organization. Centenary Editing. MIT Press, Cambridge (2012)

    Google Scholar 

  • Winkler, D.A., Le, T.C.: Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and QSAR. Mol. Inform. 36(1–2), 160011 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppina Gini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gini, G. The QSAR similarity principle in the deep learning era: Confirmation or revision?. Found Chem 22, 383–402 (2020). https://doi.org/10.1007/s10698-020-09380-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10698-020-09380-6

Keywords

Navigation