Abstract
Structure–activity relationship (SAR) and quantitative SAR (QSAR) are modeling methods largely used in assessing biological properties of chemical substances. QSAR is based on the hypothesis that the chemical structure is responsible for the activity; it follows that similar molecules are expected to have similar properties. Similarity plays an important role in read across, which categorizes molecules primarily on the basis of similarity. Similarity, and chemical similarity too, is a property differently perceived by humans. The various proposed metrics often disagree with human judgment, and no a unique metric for chemical similarity is universally adopted. Researchers argued that categorization is not only explained by similarity but depends as well on abstract knowledge and the task to accomplish. Moreover, similarity cannot be the unique explanation of a categorization, as different perceptual processes take place in category formation. Assuming that similarity judgments are deeply rooted in human knowledge and perception, cognitive sciences contributions are as important as the mathematical considerations of the classical theories. After an excursus in the many views of similarity in philosophy, mathematics, and cognitive science, the paper explores how connectionist systems, which loosely mimic the human cognitive system, could improve similarity-based choices. A case study on building (Q)SARs using connectionism and deep neural networks shows the role of similarity in building and explaining those models. A discussion about deep learning for QSAR and as a modeling tool for science concludes the presentation.
Similar content being viewed by others
References
Abrantes, P.: Analogical reasoning and modeling in the sciences. Found Sci 4, 237–270 (1999)
Barbes, J. (ed.): The Complete Works of Aristotle Bollingen Series LXXI 2, 6th edn. Princeton University Press, Princeton (1995)
Basak, S.C.: Philosophy of mathematical chemistry: a personal perspective. HYLE Int. J. Philos. Chem. 19(1), 3–17 (2013)
Bechberger, L., Kuehnberger, K.-U.: Towards grounding conceptual spaces in neural representations. In: Proceedings of Twelveth International Workshop on Neural-Symbolic Learning and Reasoning, London, UK (2017)
Benfenati, E., et al.: Results of a round-robin exercise on read-across. SAR QSAR Environ. Res. 27(5), 371–384 (2016)
Benfenati, E., et al.: A large comparison of integrated SAR/QSAR models of the Ames test for mutagenicity. SAR QSAR Environ. Res. 29(8), 591–611 (2018)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. PAMI 35(8), 1798–1828 (2013)
Benigni, R., Bossa, C.: Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. Mutat. Res. 659(3), 248–261 (2008)
Bernal, A., Daza, E.: Metabolic networks: beyond the graph. Curr. Comput. Aid Drug 7(2), 122–132 (2011)
Buckner, C., Garson, J.: Connectionism. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/fall2019/entries/connectionism/ (2019). Accessed 28 June 2020
Carnap, R.: The logical structure of the world. University of California Press , Berkeley (1928–1967)
Chakravarti, S.K., Saiakhov, R.D.: Computing similarity between structural environments of mutagenicity alerts. Mutagenesis 34(1), 55–65 (2019)
Cichy, R.M., Kaiser, D.: Deep neural networks as scientific models. Trends Cognit. Sci. 23(4), 305–317 (2019)
Cooper, J.M., Hutchinson, D.S. (eds.): Plato. Complete works. Hackett Publ. Co., Indianapolis (1997)
Decock, L., Douven, I.: Similarity after goodman. Rev Philos Psychol 2, 61–75 (2011)
Floris, M., Manganaro, A., Nicolotti, O., Medda, R., Mangiatordi, G.F., Benfenati, E.: A generalizable definition of chemical similarity for read-across. J. Cheminform. 6, 39 (2014)
Fodor, J.A.: LOT 2: The Language of Thought Revisited. Oxford University Press, Oxford (2008)
Frankel, L.: Leibniz’s Principle of Identity of Indiscernibles. Studia Leibnitiana Bd. 13, H. 2, pp 192–211 (1981)
Gärdenfors, P.: Conceptual spaces: the geometry of thought. MIT Press, Cambridge (2000)
Giere, R.N.: Using models to represent reality. In: Magnani, L., Nersessian, N.J., Thagard, P. (eds.) Model-Based Reasoning in Scientific Discovery, pp. 41–57. Springer, Boston (1999)
Gini, G.: QSAR methods. In: Benfenati, E. (ed.) In Silico Methods for Predicting Drug Toxicity, pp. 1–20. Springer, Clifton (2016)
Gini, G.: QSAR, what else? In: Nicolotti, O. (ed.) Computational Toxicology: Methods and Protocols, vol. 1800, pp. 79–105. Springer, Clifton (2018)
Gini, G., Katrizky, A. (eds.): Predictive Toxicology of Chemicals: Experiences and Impact of AI Tools. SS-99-01. AAAI Press, Menlo Park (1999)
Gini, G., Zanoli, F.: Machine learning and deep learning methods in ecotoxicological QSAR modeling. In: Roy, K. (ed.) Ecotoxicological QSARs. Springer, Berlin (2020)
Gini, G., Ferrari, T., Cattaneo, D., Bakhtyari, N.G., Manganaro, A., Benfenati, E.: Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction. SAR QSAR Environ. Res. 24(5), 365–383 (2013)
Gini, G., Franchi, A.M., Manganaro, A., Golbamaki, A., Benfenati, E.: ToxRead: a tool to assist in read across and its use to assess mutagenicity of chemicals. SAR QSAR Environ. Res. 25(12), 999–1011 (2014)
Gini, G., Zanoli, F., Gamba, A., Raitano, G., Benfenati, E.: Could deep learning in neural networks improve the QSAR models? SAR QSAR Environ. Res. 30(9), 617–642 (2019)
Goh, G., Siegel, C., Vishnu, A., Hodas, N. O., Baker, N.: Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models. https://arxiv.org/abs/1706.066892017 (2017)
Goh, G., Hodas, N., Siegel, C., Vishnu, A.: SMILES2vec: an interpretable general-purpose deep neural network for predicting chemical properties. https://arxiv.org/abs/1712.02034v2 [stat.ML] (2018)
Goldstone, R.L., Son, J.Y.: Similarity. In: Holyoak, Morrison (ed.) The Cambridge Handbook of Thinking and Reasoning, pp. 13–36. Cambridge University Press, Cambridge (2005)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, Boston (2016)
Goodman, N.: Seven strictures on similarity. In: Goodman, N. (ed) Problems and Projects, pp. 437–446. Bobbs-Merrill, Indianapolis/New York (1972)
Hahn, U., Chater, N.: Concepts and similarity. In: Lamberts, L., Shanks, D. (eds.) Knowledge, Concepts, and Categories. Psychology Press/MIT Press, Hove (1997)
Hampton, J.A.: Typicality, graded membership, and vagueness. Cognit. Sci. 31, 355–384 (2007)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Jacob, E.K.: Classification and categorization: a difference that makes a difference. Univ. Ill. Libr. Trends 52(3), 515–540 (2004)
Johnson, A.M., Maggiora, G.M.: Concepts and Applications of Molecular Similarity. Willey, New York (1990)
Kirkpatrik, P., Ellis, C.: Chemical space. Nature 32(16), 823 (2004)
Kitcher, P.: The Advancement of Science: Science Without Legend. Oxford University Press, Objectivity Without Illusions (1993)
Kubinyi, H.: Chemical similarity and biological activities. J. Braz. Chem. Soc. 13(6), (2002)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Lipkus, A.H., Yuan, Q., Lucas, K.A., Funk, S.A., Bartelt III, W.F., Schenck, R.J., Trippe, A.J.: Structural diversity of organic chemistry. A scaffold analysis of the CAS registry. J. Org. Chem. 73, 4443–4451 (2008)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: Proceedings of IJCAI-16, pp. 2873–2879 (2016)
Maggiora, G.M.: On outliers and activity cliffs—why QSAR often disappoints. J. Chem. Inf. Model. 46(4), 1535 (2006)
Maggiora, G., Vogt, M., Stumpfe, D., Bajorat, J.: Similarity in medicinal chemistry. J. Med. Chem. 57(8), 3186–3204 (2014)
Marquis, J. P.: Category theory. The Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/archives/sum2020/entries/category-theory/ (2020). Accessed 28 June 2020
Martin, Y.C., Kofron, J.L., Traphagen, L.M.: Do structurally similar molecules have similar biological activity? J. Med. Chem. 45(19), 4350–4358 (2002)
Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. 3, 80 (2016)
Miller, G.A.: The cognitive revolution: a historical perspective. Trends Cognit. Sci. 7(3), 141–144 (2003)
Nikolova, N., Jaworska, J.: Approaches to measure chemical similarity— a review. QSAR Comb. Sci. 22(9–10), 1006–1026 (2003)
Olshausen, A.B., Field, D.J.: Sparse coding of sensory inputs. Curr. Opin. Neurobiol. 14, 481–487 (2004)
Quine, W.V.: Natural kinds. In: Rescher, N. (ed.) Essays in Honor of Carl G. Hempel, pp. 5–23. D. Reidel, Dordrecht (1970)
Restrepo, G., Harré, R.: Mereology of Quantitative Structure-Activity Relationships Models. HYLE Int. J. Philos. Chem. 21(1), 19–38 (2015)
Rouvray, H. (ed.): Concepts in Chemistry: A Contemporary Challenge. Wiley, New York (1997)
Shepard, R.N.: The analysis of proximities: multidimensional scaling with an unknown distance function. Part 1. Psychometrika 27, 125–140 (1962)
Tanimoto, T.T.: IBM Internal Report. IBM Corporation, Armonk, NY, Nov 17, 1957
Todeschini, R., Consonni, V.: Molecular Descriptors for Chemoinformatics (2 Volumes). Wiley-VCH, Weinheim (2009)
Torgerson, W.S.: Multidimensional scaling of similarity. Psychometrika 30, 379–393 (1965)
Toropov, A.P., et al.: Comparison of SMILES and molecular graphs as the representation of the molecular structure for QSAR analysis for mutagenic potential of polyaromatic amines. Chem. Intell. Lab. Syst. 109, 94–100 (2011)
Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–354 (1977)
Weininger, M., Weininger, A., Weininger, J.L.: SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Model. 29, 97–101 (1989)
Wertheimer, M.: Investigations on gestalt principles. In: Spillmann, L. (ed) On Perceived Motion and Figural Organization. Centenary Editing. MIT Press, Cambridge (2012)
Winkler, D.A., Le, T.C.: Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and QSAR. Mol. Inform. 36(1–2), 160011 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gini, G. The QSAR similarity principle in the deep learning era: Confirmation or revision?. Found Chem 22, 383–402 (2020). https://doi.org/10.1007/s10698-020-09380-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10698-020-09380-6