Paper The following article is Open access

Predicting toxicity by quantum machine learning

and

Published 29 December 2020 © 2020 The Author(s). Published by IOP Publishing Ltd
, , Citation Teppei Suzuki and Michio Katouda 2020 J. Phys. Commun. 4 125012 DOI 10.1088/2399-6528/abd3d8

2399-6528/4/12/125012

Abstract

In recent years, parameterized quantum circuits have been regarded as machine learning models within the framework of the hybrid quantum–classical approach. Quantum machine learning (QML) has been applied to binary classification problems and unsupervised learning. However, practical quantum application to nonlinear regression tasks has received considerably less attention. Here, we develop QML models designed for predicting the toxicity of 221 phenols on the basis of quantitative structure activity relationship. The results suggest that our data encoding enhanced by quantum entanglement provided more expressive power than the previous ones, implying that quantum correlation could be beneficial for the feature map representation of classical data. Our QML models performed significantly better than the multiple linear regression method. Furthermore, our simulations indicate that the QML models were comparable to those obtained using radial basis function networks, while improving the generalization performance. The present study implies that QML could be an alternative approach for nonlinear regression tasks such as cheminformatics.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Quantitative structure activity relationship (QSAR) is one of major computational molecular modeling methods. The QSAR approach attempts to correlate molecular descriptors of compounds with their physicochemical properties; over the past decades, it has been used for predicting toxicity and bioactivities as well as finding new drug leads in chemical and pharmaceutical areas [14]. Nowadays, owing to rapid development of information and communication technologies, huge amounts of physicochemical data coming from a variety of resources have been accumulated. Currently, databases containing millions of chemical compounds and their activities against biological assays are available on various platforms. As a consequence, there is a growing need for innovation in computer technology that can efficiently and accurately analyze ever-increasing amounts of physicochemical and biological data [5].

In the last years, quantum computing [68] has attracted much attention because it is one of the most promising quantum technologies that could radically transform science and many areas of industry. Although large-scale, fault-tolerant quantum computers have not yet been invented, noisy intermediate-scale quantum (NISQ) computers [6] have been applied to various areas of science and technology: chemistry [913], optimization [1416], and finance [17, 18], to name but a few. A promising scheme for practical applications on NISQ devices is the hybrid quantum–classical algorithm [6, 8], in which computational tasks are deliberately divided into quantum and classical resources using a parameterized approach. Two important classes of such quantum algorithm are the variational quantum eigensolver for quantum simulation [1921] and the quantum approximate optimization algorithm for combinatorial optimization [2225].

More recently, quantum machine learning (QML) [2630] is a rapidly growing research field that combines near-term quantum algorithms and machine learning techniques. In particular, parameterized quantum circuits (PQCs) have been considered as machine learning models with high expressive power within the hybrid quantum–classical framework [31, 32]. PQCs are typically composed of fixed quantum gates (e.g., qubit rotations and entangling gates) in a shallow circuit layout, with variable parameters optimized in a classical feedback loop. So far, QML has been successfully applied to both discriminative [3337] and generative [38, 39] models. Examples of these include binary classification problems for image recognition [34], kernel methods for support vector machine [40, 41], and unsupervised machine learning in finance [42]. To our knowledge, however, the application of QML to regression tasks has not been fully investigated in the literature. It remains unclear what kinds of quantum states should be used in order to generate the feature map with high expressibility that is suited for real-world data sets.

To explore the possibility of near-term quantum applications to regression tasks, here we apply the QML method to quantitative structure–toxicity relationship (QSTR) models for predicting the toxicity of 221 phenols. While there are a variety of QSAR/QSTR models (e.g., 3D-QSAR [1, 4]), as a first step we employ QSAR/QSTR models including molecular descriptors such as hydrophobicity, acidity constant, and frontier orbital energies. There have been quantum computations in biochemical and pharmaceutical areas, such as protein folding [4345], molecular similarity [46], and biological data [47]; yet, there has been no study on quantum application to QSAR modeling, albeit an important part of ligand-based computer-aided drug design.

The remainder of the paper is organized as follows. In Methods, we briefly review PQC-based machine learning and then describe our QML models in full detail. The information about the data set used for the QSTR modeling is also provided. In Results and Discussion, we present the results of our QSAR models and numerically investigate how different encodings, variational circuit architectures, and redundant encodings using multiple qubits affect the performance of the QML models. In addition, we compare the performance of our best QML models with those obtained by conventional chemometrics methods and comment on several perspectives on QML. Then, we summarize our conclusions.

2. Methods

2.1. Parameterized quantum circuits

In recent years, PQCs have been regarded as machine learning models with high expressive power within the framework of the hybrid quantum–classical approach. PQCs are usually composed of one-qubit rotations and two-qubit entangling operations in a shallow circuit layout, with parameters optimized in a feedback loop. A recent review on PQCs can be found in the literature [32]. Combining near-term quantum algorithms and machine learning, QML using the framework of PQCs is sometimes referred to as quantum circuit learning (QCL) [31]. So far, QML has been applied to both discriminative and generative tasks [3342]; on the other hand, the application of QML to regression tasks has not been thoroughly investigated.

From the viewpoint of the machine learning architecture, PQCs consist of three components: the encoder circuit, the variational circuit, and the measurement for the estimation of the loss function. First, an encoder circuit loads classical d-dimensional data ${\boldsymbol{x}}={({x}_{1},{x}_{2},\ldots ,{x}_{d})}^{{\rm{T}}}\in {{\mathbb{R}}}^{d}$ into a higher-dimensional feature map ${U}_{{\rm{\Phi }}\left({\boldsymbol{x}}\right)}$ in the Hilbert space, which produces a quantum state ${U}_{{\rm{\Phi }}\left({\boldsymbol{x}}\right)}{\left.\left|0\right.\right\rangle }^{\otimes n},$ with $n$ being the number of qubits. The number of qubits $n$ can be set to the dimension of input data $d$ (other situations are also considered in the present work). Such approach may be less efficient in terms of the number of qubits but is efficient in terms of circuit depth. Second, a variational circuit $U({\boldsymbol{\theta }})$ acts on the quantum state prepared by the encoder circuit, in order to explore the quantum-enhanced feature space using trainable parameters ${\boldsymbol{\theta }},$ leading to the parameterized quantum state $U({\boldsymbol{\theta }}){U}_{{\rm{\Phi }}\left({\boldsymbol{x}}\right)}{\left.\left|0\right.\right\rangle }^{\otimes n}.$ Third, the loss function can be estimated from the expectation value by measurements. In the following subsections, we will closely look at each step of our QML models.

2.2. Encoder circuit

Data representation is essential for the success of machine learning models. In QML, loading classical data as a quantum state is an important and challenging task; in fact, the choice of encoding in PQCs is analogous to selecting a feature map in kernel-based machine learning techniques [32, 41]. Several methods for encoding input data into qubits have been proposed: amplitude encoding [32, 41], angle encoding [33, 48], a random linear map [34], and data re-uploading [37]. However, it is not a priori obvious what kind of encoding is suitable for our particular application. With this in mind, we first employ three methods of loading classical data into a quantum state (note that we can pre-process input data by means of normalization).

A first encoding is the one proposed by Mitarai et al [31]:

Equation (1)

This approach was originally motivated by expanding the density operator of a quantum state in terms of a set of Pauli operators [31]. A second encoding we consider is an angle encoding [33, 48] and the corresponding unitary operator ${{\mathscr{U}}}_{{\rm{A}}1}$ can be defined by

Equation (2)

This scheme is sometimes referred to as qubit encoding [49]. The encoding can be viewed as the product of local kernels, where each component of the input vector is encoded into a local feature map; it has the same structure as a product quantum state that is unentangled [32]. This kind of encoding, though seemingly simple, has been applied for tree tensor network classifiers in QML [33]. A third encoding is related to the second one and uses a couple of single-qubit rotations. The corresponding unitary operator ${{\mathscr{U}}}_{{\rm{A}}2}$ can be expressed as

Equation (3)

This encoding loads each component of the input vector into two angles in the Bloch sphere, generating a certain redundancy in encoding and hence the possible modification in the feature map.

In addition to investigating different ways of encoding, we explore the possibility that entanglement might extend the flexibility in data representation. In fact, the previous studies suggest that entangling gates play essential roles in quantum generative models [39, 50] and in expressibility for PQCs [51]; in particular, repeated circuit layers with entangling controlled NOT (CNOT) gates provide high expressive power [50, 51]. In this work, we propose an encoder circuit containing entangler blocks in data representation. Such encoding circuit can be expressed as multiple layers of single-qubit rotations followed by two-qubit entangling gates:

Equation (4)

Here, the $k\,$th layer of the operations comprises a product of two operations: (i) the unitary operator ${{\mathscr{U}}}_{{\phi }_{k}({\boldsymbol{x}})}$ that is any reasonable encoder circuit loading classical input data ${\boldsymbol{x}}$ and (ii) the two-qubit entangling operation ${E}_{{\rm{ent}}}^{k}$ that is typically composed of CNOT or controlled Z (CZ) gates (which are hereafter denoted as ${E}_{{\rm{CNOT}}}$ and ${E}_{{\rm{CZ}}},$ respectively). In the following, the encoding described in equation (4) is referred to as entangler-enhanced encoding. We could expect that such encoding might expand the representation ability in the feature map, owing to quantum entanglement. From the viewpoint of quantum physics, the above encoding can be interpreted as a concatenated tensor network and this family of quantum circuits can describe a high dimensional tensor network in an efficient way [52]. In the present study, we consider the following encoding composed of two layers:

Equation (5)

where the unitary operators ${{\mathscr{U}}}_{1}$ and ${{\mathscr{U}}}_{2}$ can be any of the three encodings mentioned earlier. Our approach can be viewed as an extension of the previous QCL [31], where the feature map is represented by the product state. To investigate the performance of our entangler-enhanced encoders, we considered 10 combinations for ${{\mathscr{U}}}_{1},$ ${{\mathscr{U}}}_{2},$ and ${E}_{{\rm{ent}}},$ which are summarized in table 1.

Table 1. Encoder circuits investigated in the present work (${E}_{{\rm{ent}}}{{\mathscr{U}}}_{2}{E}_{{\rm{ent}}}{{\mathscr{U}}}_{1}$) and the corresponding ID. Note that the first three encodings are conventional encoders (i.e., ${E}_{{\rm{ent}}}$ and ${{\mathscr{U}}}_{2}$ are replaced by the identity operator) whereas the remaining 10 encoders contain entangler blocks ${E}_{{\rm{ent}}}.$ For the definitions of the unitary operations, see the text.

Encoder Circuit ID ${{\mathscr{U}}}_{1}$ ${{\mathscr{U}}}_{2}$ ${E}_{ent}$
M ${{\mathscr{U}}}_{{\rm{M}}}$
A1 ${{\mathscr{U}}}_{{\rm{A}}1}$
A2 ${{\mathscr{U}}}_{{\rm{A}}2}$
M–M–CNOT ${{\mathscr{U}}}_{{\rm{M}}}$ ${{\mathscr{U}}}_{{\rm{M}}}$ ${E}_{{\rm{CNOT}}}$
A1–A1–CNOT ${{\mathscr{U}}}_{{\rm{A}}1}$ ${{\mathscr{U}}}_{{\rm{A}}1}$ ${E}_{{\rm{CNOT}}}$
A2–A2–CNOT ${{\mathscr{U}}}_{{\rm{A}}2}$ ${{\mathscr{U}}}_{{\rm{A}}2}$ ${E}_{{\rm{CNOT}}}$
M–A1–CNOT ${{\mathscr{U}}}_{{\rm{M}}}$ ${{\mathscr{U}}}_{{\rm{A}}1}$ ${E}_{{\rm{CNOT}}}$
M–A2–CNOT ${{\mathscr{U}}}_{{\rm{M}}}$ ${{\mathscr{U}}}_{{\rm{A}}2}$ ${E}_{{\rm{CNOT}}}$
M–M–CZ ${{\mathscr{U}}}_{{\rm{M}}}$ ${{\mathscr{U}}}_{{\rm{M}}}$ ${E}_{{\rm{CZ}}}$
A1–A1–CZ ${{\mathscr{U}}}_{{\rm{A}}1}$ ${{\mathscr{U}}}_{{\rm{A}}1}$ ${E}_{{\rm{CZ}}}$
A2–A2–CZ ${{\mathscr{U}}}_{{\rm{A}}2}$ ${{\mathscr{U}}}_{{\rm{A}}2}$ ${E}_{{\rm{CZ}}}$
M–A1–CZ ${{\mathscr{U}}}_{{\rm{M}}}$ ${{\mathscr{U}}}_{{\rm{A}}1}$ ${E}_{{\rm{CZ}}}$
M–A2–CZ ${{\mathscr{U}}}_{{\rm{M}}}$ ${{\mathscr{U}}}_{{\rm{A}}2}$ ${E}_{{\rm{CZ}}}$

Another approach to increase the flexibility of the feature space is to use $p$ copies of quantum states (i.e., the $p$-times product) at the outset, which means that each component of the input data is encoded into multiple qubits [31, 41]. While this scheme requires additional quantum resources, it generates higher-order terms in the feature map, which is likely to give rise to more expressive power and a richer class of functions. A recent study indicates that such input redundancy is necessary for the task of data fitting and that it grows at least logarithmically in the complexity of the functions [53]. For each encoding in table 1, we thus consider feature maps using two and three copies of the quantum states in encoding.

2.3. Variational circuit

The essential role of the variational circuit $U({\boldsymbol{\theta }})$ is to explore efficiently the quantum-enhanced feature space generated by PQCs. The variational circuit originally reported in the literature is based on the time evolution of Ising Hamiltonian [31]; it uses the Trotter decomposition method, which requires an additional computational cost. Another disadvantage of the method is that it is rather memory-intensive when performing quantum simulator on classical processors.

To circumvent the limitations, we employed quantum circuits inspired by the strategy of the hardware-heuristic ansatz [13], which was originally motivated by the limitations of existing NISQ devices in fidelity and connectivity. On the basis of the architecture of PQCs [13, 51], here we propose that the variational circuit be constructed by $L$ layers of the unit circuit consisting of single-qubit rotations ${U}_{{\ell }}({{\boldsymbol{\theta }}}_{{\ell }})$ and two-qubit entangler blocks ${E}_{{\rm{ent}}}^{{\ell }}$ comprising CNOT or CZ gates:

Equation (6)

From a physical standpoint, such quantum circuit can be interpreted as a concatenated tensor network, which can be used for an efficient description of time-evolved quantum states [52]. In this work, we investigated the performance of the three variational circuits (see figure 1): one was the variational circuit based on the time evolution of Ising Hamiltonian and the other two were the modified variational circuits based on the hardware-heuristic approach (the total number of two-qubit gates is $nL$). In both approaches, the total number of trainable parameters is $3nL.$

Figure 1.

Figure 1. Circuit unit (template) used for the variational circuit $U({\boldsymbol{\theta }}):$ (a) circuit unit exploiting on the time evolution of Hamiltonian, (b) circuit unit containing a CNOT block ${E}_{{\rm{CNOT}}}=\displaystyle {\prod }_{q=1}^{n}{{\rm{CNOT}}}_{q,(q+1){\rm{mod}}n},$ and (c) circuit unit containing a CZ block ${E}_{{\rm{CZ}}}=\displaystyle {\prod }_{q=1}^{n}{{\rm{CZ}}}_{q,(q+1){\rm{mod}}n}.$ Note that each circuit unit can be repeated $L$ times in the variational circuit. The total number of trainable parameters in $U({\boldsymbol{\theta }})$ is thus $3nL;$ the total number of two-qubit entangling gates in $U({\boldsymbol{\theta }})$ is $nL$ in the cases of (b) and (c).

Standard image High-resolution image

2.4. Measurements and supervised learning

In QML, the measurement of quantum states extracts the information that can be used for supervised learning. For instance, a QML architecture can measure an expectation value by acting a Pauli Z operator on a single qubit (figure 2(a)). This expectation value can be used for the evaluation of the loss function. Since the information is reduced to only one qubit and then extracted by the measurement, this approach may be considered as pure QML (unless otherwise mentioned, this scheme was employed in this work). For the values ${\boldsymbol{y}}={({y}^{\left(1\right)},{y}^{(2)},\ldots ,{y}^{(N)})}^{{\rm{T}}}\in {{\mathbb{R}}}^{N}$ (where $N$ is the number of data samples) and the expectation values $\{{m}^{(i)}\},$ the loss function $ {\mathcal L} $ can be given by

Equation (7)

Figure 2.

Figure 2. Measurement of (a) an expectation value from a single qubit or (b) a set of expectation values from multiple qubits.

Standard image High-resolution image

Note that $\{{y}^{\left(i\right)}\}$ are normalized between 0 and 1. Another approach is to use a set of expectation values ​from multiple qubits [32] for the evaluation of the loss function. This scheme can be viewed as hybrid quantum–classical machine learning and such quantum circuits are also considered in this work (figure 2(b)). By hybrid QML, we mean that the measurements of quantum states generated by PQCs are used as input for an additional classical machine learning model. Note that the hybrid quantum–classical algorithm is a more general framework for NISQ applications. As a first step, we simply use the expectation values as input for a multiple linear model. For a set of expectation values from $M$ qubits for ith data, ${\{{m}_{q}^{(i)}\}}_{q=1}^{M},$ the predicted value ${\hat{y}}^{(i)}$ can be expressed as

Equation (8)

with the coefficient vector ${\boldsymbol{\beta }}={({\beta }_{1},{\beta }_{2},\ldots ,{\beta }_{M})}^{{\rm{T}}}\in {{\mathbb{R}}}^{M}.$ In this linear model, the loss function $ {\mathcal L} $ can be estimated by

Equation (9)

with $\{{ {\mathcal M} }_{qi}\}:=\left\{{m}_{q}^{\left(i\right)}\right\}\in {{\mathbb{R}}}^{M\times N}$ and the optimized ${\boldsymbol{\beta }}* ={\left( {\mathcal M} { {\mathcal M} }^{{\rm{T}}}\right)}^{-1} {\mathcal M} {\boldsymbol{y}}.$

In our regression tasks, we used a standard approach that minimizes the loss function with respect to trainable parameters ${\boldsymbol{\theta }}.$ In the present work, a regularization term was not included, since overfitting would be effectively avoided owing to the inherent constraints arising from the unitary conditions [31]. In minimizing the loss function, we used the Nelder–Mead method [54], which is a gradient-free algorithm. In our QML models, the scaling factor for observable quantities from the measurements ${f}_{Z}$ is a hyperparameter. Thus, we systematically varied the hyperparameter in our simulations. Values for the scaling factor for the expectation value $\left\langle Z\right\rangle $ were chosen between 1.0 and 10.0, depending on models (the values for the scaling factor were tuned by the grid search scheme). We used mean squared error (MSE) to evaluate the error of a prediction. The coefficients of determination (${R}^{2}$) were calculated for evaluating the performance of our QSTR models.

2.5. Implementation

We implemented our QML models using Qulacs [55], a Python/C++ library for quantum circuit simulation. The time evolution gate of Ising Hamiltonian needed for the original QCL model was implemented using NumPy [56] and SciPy [57] libraries. The Nelder–Mead optimization of Pauli rotation angles was implemented using scipy.optimize module in SciPy library. The k-fold cross-validation was implemented using KFold module in scikit-learn [58] library. Pre- and post-processing of the data set was implemented using pandas [59] library in combination with NumPy and SciPy libraries.

2.6. Data set

In our QSTR models, we used a data set of 221 phenols, for which toxicity data to the ciliate Tetrahymena pyriformis in terms of $\mathrm{log}(1/{{\rm{IC}}}_{50})$ are available [60]. We used the following molecular descriptors: hydrophobicity ($\mathrm{log}\,{K}_{{\rm{ow}}}$), acidity constant (${\rm{p}}{K}_{{\rm{a}}}$), frontier orbital energies (${E}_{{\rm{HOMO}}}$ and ${E}_{{\rm{LUMO}}}$), and hydrogen bond donor/acceptor counts (${N}_{{\rm{hdon}}}$). The data set has been used for evaluating the performance and predictive abilities of standard chemometrics methods [6062]: multiple linear regression (MLR), support vector machine, and radial basis function neural networks (RBF-NNs). To compare our QML models with conventional chemometrics methods, we trained MLR and RBF-NN models on the same data set (we did not train an SVM model because the performance of SVM is comparable to that of RBF-NN in QSAR studies [61]). Following the previous QSTR study [62], we used the hold-out validation for our QSTR models; more specifically, we used 180 compounds for a training set and 41 for a validation set. Note that the data splitting we used for the hold-out validation was exactly the same as in the previous work [62], in which the Kennard–Stone algorithm [63] was employed for generating the data splitting in order to make all the validation data fall inside the training data. Such data splitting is useful because the data set in the QSTR study is somewhat widely distributed and contains certain outliers. We also performed 5-fold cross-validation on the entire data set that had been randomly sorted.

2.7. Simulation details

All of the simulations for QML, MLR, and RBF-NN models were performed on a classical computational platform, powered by Intel Xeon Gold 6154 processors with 192 GB memory. All the simulations except for the QML models with 15 qubits were performed using a single CPU core; and the QML simulations with 15 qubits were performed by OpenMP parallel jobs using 9 CPU cores.

3. Results and discussion

3.1. Encoder circuit

To begin with, we compared the performance of the three conventional encodings with 5 qubits (table 2). As to the performance of ${R}_{{\rm{train}}}^{2},$ the A1 and A2 encoders (0.777 and 0.735) performed better by 15% than the M encoder (0.658). The results indicate that the angle encodings provide more flexibility in data encoding owing to its simplicity and high nonlinearity. To improve the performance of our QML models, we then explored the possibility that entanglement might extend expressive power in data representation. It has been shown that entangling gates play essential role in quantum generative models [39, 50] and in expressibility for PQCs [51].

Table 2. Coefficients of determination for the training and the validation sets (${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$) using 13 different encoder circuits with 5 qubits, the optimized number of layers in the variational circuit $L(3\leqslant L\leqslant 12)$ and the scaling factor ${f}_{Z}$ for the expectation value $Z.$

Encoder Circuit ID ${R}_{{\rm{train}}}^{2}$ ${R}_{{\rm{val}}}^{2}$ L ${f}_{Z}$
M0.6560.8101010.0
M–M–CZ0.6820.843910.0
M–A1–CZ0.7760.855118.0
M–A2–CZ0.8200.821122.0
M–M–CNOT0.7400.836118.0
M–A1–CNOT0.7840.83372.0
M–A2–CNOT0.819 0.849 62.0
A10.7770.82472.0
A20.7350.81784.0
A1–A1–CZ0.8220.848112.0
A2–A2–CZ0.7740.80584.0
A1–A1–CNOT0.8080.828121.0
A2–A2–CNOT 0.842 0.844111.0

We employed the encoder circuits having CNOT or CZ gates (table 2). As to the performance of ${R}_{{\rm{train}}}^{2},$ our entangler-enhanced encodings containing ${{\mathscr{U}}}_{{\rm{M}}}$ performed better by 15% than the original ${{\mathscr{U}}}_{{\rm{M}}}$ unitary. In the case of the angle encodings, the encoders containing entangling gates outperformed those without entanglement by 7%. In particular, the A2–A2–CNOT encoder provided the best performance (0.842), followed by that obtained by the A1–A1–CZ encoder (0.822). This result is consistent with the previous studies on PQCs, in which repeated circuit layers with entangling gates provide high expressive power [51]. Our results indicate that the feature map using the product state was inadequate for our application in terms of expressibility and that the entangler-enhanced encodings provided more expressive power in data representation with the aid of quantum entanglement. This implies that quantum correlation could be advantageous for the feature map representation of classical input data.

To comprehend the roles of the redundancy in encoding associated with higher-dimensional local feature maps, we then increased the number of qubits in our QML models. In this scheme, each component of the input data is encoded into multiple qubits. Recently, Vidal and Theis investigated whether the redundancy in PQCs is useful for the task of data fitting [53]; and their study indicates that lower bounds of the redundancy are logarithmic in terms of the complexity of the functions. Since five molecular descriptors were contained in our QSTR models, we used 10 and 15 qubits, which corresponds to two and three qubits per input data, respectively.

The use of 10 qubits (two copies of the quantum states) led to a 10% increase in ${R}_{{\rm{train}}}^{2}$ in comparison with the case of 5 qubits (figures 3(a) and (b)). The paired t-test also suggested that the difference in ${R}_{{\rm{train}}}^{2}$ between 5- and 10-qubit cases was statistically significant. The results indicate that, in our QML models with 10 qubits, higher dimensionality was effectively taken into account owing to the redundancy in multiple-qubit encoding. By encoding each component of the input data into the higher-dimensional local feature map, the encoder is composed of a more complete basis of functions and can respond to smaller changes in the input data [49]. In line with the results with 5 qubits, the A2–A2–CNOT encoder provided the best performance in ${R}_{{\rm{train}}}^{2}$ (0.906). (table 3). This confirms that our entangler-enhanced encodings provided more flexibility in data representation. The encoders containing CNOT gates had the tendency to perform better than those containing CZ gates. This expressive power might be related to the fact that increasing CNOT gates in multilayer PQCs leads to an increase in the bond dimensions in tensor networks [50].

Figure 3.

Figure 3. Performance for the training set using 13 different encoder circuits in the cases of 5 (red), 10 (blue), and 15 (grey) qubits. (a) Coefficients of determination for the training set (${R}_{{\rm{train}}}^{2}$) using 13 different encoder circuits (for the definitions of the encoder circuits, see table 1). (b) Average ${R}_{{\rm{train}}}^{2}$ values for 5, 10, and 15 qubits. (c) Average number of layers $L$ in the variational circuit for 5, 10, and 15 qubits. In (b) and (c), error bars indicate the standard error and asterisks indicate statistical significance in comparison with the case of 5 qubits (paired t-test; * = $p\lt 0.05;$ *** = $p\lt 0.0005$).

Standard image High-resolution image

Table 3. Coefficients of determination for the training and the validation sets (${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$) using 13 different encoder circuits with 10 qubits (two copies of the quantum states), the optimized number of layers in the variational circuit $L(3\leqslant L\leqslant 12)$ and the scaling factor ${f}_{Z}$ for the expectation value $Z.$

Encoder Circuit ID ${R}_{{\rm{train}}}^{2}$ ${R}_{{\rm{val}}}^{2}$ L ${f}_{Z}$
M0.7730.807108.0
M–M–CZ0.8060.8301010.0
M–A1–CZ0.8460.84286.0
M–A2–CZ0.8810.839128.0
M-M-CNOT0.8160.8271210.0
M–A1–CNOT0.8570.851910.0
M–A2–CNOT0.8730.85296.0
A10.8280.851510.0
A20.8750.8431210.0
A1–A1–CZ0.8930.853910.0
A2–A2–CZ0.8230.84268.0
A1–A1–CNOT0.8810.86298.0
A2–A2–CNOT 0.906 0.869 1010.0

The average values for ${R}_{{\rm{train}}}^{2}$ were 0.772, 0.851, and 0.832 for 5, 10 and 15 qubits, respectively (see also figure 3(b) and table A1). According to the paired t-test, the difference in ${R}_{{\rm{train}}}^{2}$ between 5- and 15-qubit cases was also statistically significant (figure 3(b)). The average numbers of layers $L$ in $U({\boldsymbol{\theta }})$ were 9.8, 9.3, and 7.4 for 5, 10, 15 qubits, respectively, indicating that the number of layers $L$ was decreased with respect to the number of qubits (figure 3(c)). These numerical results indicate that, in our QML models, encoding each component of the input vector into multiple qubits appeared to be helpful for improving the performance. It is possible that the performance could experience some sort of saturation in terms of the number of qubits. A computational aspect is that increasing the number of qubits causes an increase in the number of trainable parameters in $U({\boldsymbol{\theta }}),$ which could result in the slower convergence in minimizing the cost functions. In this particular application, our QML models using the entangler-enhanced encoding with 10 qubits appeared to give good results. Our results agree with the previous study on quantum classifiers using data re-uploading [37], in which using more qubits and entanglement increases the classification success rate and reduces the number of layers required.

3.2. Variational circuit

To understand how the architecture of the variational circuit affects the performance and the computational cost, we used the three variational circuits while using the same encoding circuit (the M encoder). The first variational circuit was the one based on Ising Hamiltonian, which was previously proposed [31]. The second and third circuits were CNOT-based and CZ-based variational circuits, respectively. The latter two circuits are motivated by the strategy of hardware heuristic ansatz in order to circumvent the limitations of quantum hardware; the two circuits can avoid an additional computational cost generated by the Trotter decomposition.

According to our numerical tests on simple regression tasks, we found that CNOT-based variational circuit provided a similar performance compared with the variational circuit based on Ising Hamiltonian, whereas CZ-based variational circuit gave an inferior performance (table B1). The results indicate that repeated circuit layers with entangling CNOT gates provide high expressive power, in line with the previous studies, where CNOT gates play important roles in expressibility of PQCs [50, 51]. Therefore, we employed the variational circuit containing entangling CNOT gates, unless otherwise mentioned. In addition, we observed the substantial computational speedup by using the variational circuit containing entangling gates, compared with the original variational circuit based on Ising Hamiltonian. A major disadvantage of the latter is that the computational cost and the memory requirement for the calculation of the Trotter operator matrix grows exponentially with respect to the number of qubits, for quantum simulator on classical processors (table B2). For that reason, we recommend the use of the hardware-heuristic variational circuits.

Furthermore, we checked the effects of the number of unit layers $L$ on the performance of our QML models. According to our simulations $(3\leqslant L\leqslant 15),$ adding unit layers normally provided good results; a typical example of this tendency can be found in figure 4(a), in which ${R}_{{\rm{train}}}^{2}$ obtained using the A2–A2–CNOT encoder is gradually improved as a function of $L$ (as to the case of 5 qubits, the performance appears to be saturated for $L\geqslant 12$). This is consistent with the decrease in MSE for the training set by increasing $L$ (figure 4(b)). The results imply an improved efficiency in exploring the solution space by adding circuit unit layers, in agreement with the previous studies on PQCs [50, 51]. We also found that the optimized numbers of layers in our QML models was significantly dependent on the choice of the PQC architecture and of the encoding (see also tables 2 and 3). A similar tendency has been reported in the previous work on the expressibility and entangling capability of PQCs, in which the rates of change in expressibility with respect to the number of layers tend to vary from circuit to circuit [51].

Figure 4.

Figure 4. Coefficients of determination ${R}_{{\rm{train}}}^{2}$ (a) and MSE (b) for the training set as a function of the number of unit layers $L$ in $U({\boldsymbol{\theta }}).$ The results were obtained using the A2–A2–CNOT encoder with 5 (red) and 10 (blue) qubits. For the definition of the A2–A2–CNOT encoder, see table 1.

Standard image High-resolution image

3.3. Final QML model

Considering the results presented in the previous subsections, we obtained our final QML model suitable for our particular application (depicted in figure 5). Our final model can be described as follows. The quantum circuit for data representation is given by the entangler-enhanced encoder ${E}_{{\rm{CNOT}}}{{\mathscr{U}}}_{2}{E}_{{\rm{CNOT}}}{{\mathscr{U}}}_{1};$ in our best QML model, ${{\mathscr{U}}}_{1}={{\mathscr{U}}}_{2}={{\mathscr{U}}}_{{\rm{A}}2}$ (i.e., the A2–A2–CNOT encoder) (figure 5(a)). Hence, the feature map can be given by $\left.\left|{\rm{\Psi }}\right.\right\rangle ={E}_{{\rm{CNOT}}}{{\mathscr{U}}}_{{\rm{A}}2}{E}_{{\rm{CNOT}}}{{\mathscr{U}}}_{{\rm{A}}2}{\left.\left|0\right.\right\rangle }^{\otimes n}.$ This kind of encoder can be viewed as a 2D tensor network, in which the entangler block can be interpreted as the periodic boundary condition (figure 5(b)). Each component of the input data is encoded into two qubits, meaning that the feature map with higher dimensionality can be taken into account; consequently, 10 qubits are used for encoding because five molecular descriptors are contained in our QSTR model. In this way, our final QML model allows multiple data encoding, which is consistent with the idea of input redundancy [53] or data re-uploading [37]. The variational circuit $U({\boldsymbol{\theta }})$ is given by a multilayer PQC: $U({\boldsymbol{\theta }})=\displaystyle {\prod }_{{\ell }=1}^{L}{U}_{{\ell }}({{\boldsymbol{\theta }}}_{{\ell }}){E}_{{\rm{CNOT}}}^{{\ell }}.$

Figure 5.

Figure 5. Quantum circuit (a) and the graphical tensor network representation (b) for our QML model suitable for our QSTR study. The feature map is given by $\left.\left|{\rm{\Psi }}\right.\right\rangle ={E}_{{\rm{CNOT}}}{{\mathscr{U}}}_{2}{E}_{{\rm{CNOT}}}{{\mathscr{U}}}_{1}{\left.\left|0\right.\right\rangle }^{\otimes n},$ (in our best QML model, ${{\mathscr{U}}}_{1}={{\mathscr{U}}}_{2}={{\mathscr{U}}}_{{\rm{A}}2}$). From a physical standpoint, such quantum circuit can be interpreted as a 2D tensor network, in which the entangler block can be interpreted as the periodic boundary condition. Each component of the input data is encoded into two qubits (i.e., $n=10$ qubits) in order to increase dimensionality in the feature map. The variational circuit is given by $U({\boldsymbol{\theta }})=\displaystyle {\prod }_{{\ell }=1}^{L}{U}_{{\ell }}({{\boldsymbol{\theta }}}_{{\ell }}){E}_{{\rm{CNOT}}}^{{\ell }},$ which is a multilayer PQC (for the inner details of the variational circuit $U({\boldsymbol{\theta }}),$ see figure 1(b)). The measurement can be done by either a single qubit or multiple qubits (see also figure 2). Note that only a single-qubit measurement is depicted.

Standard image High-resolution image

3.4. Measurements and the hybrid approach

We compared the performance between pure and hybrid QML models for the A2–A2–CNOT encoder with 10 qubits (figure C1). Overall, the values for ${R}_{{\rm{train}}}^{2}$ were improved by about 2% when using the hybrid QML approach, in which the expectation values from $M$ qubits were fed into the evaluation of the loss function. However, increasing the number of qubits for the measurements $M$ did not necessarily lead to incremental improvements in the ${R}_{{\rm{train}}}^{2}$ performance. Rather, we found that the number of unit layers $L$ in $U({\boldsymbol{\theta }})$ had an overall impact on the ${R}_{{\rm{train}}}^{2}$ performance. Also, there were quite a few cases where the performance on the validation set were not improved, compared with those obtained by pure QML models (this topic will be discussed in the next subsection). On the other hand, we found that the QML model with $M=4$ and $L=10$ provided the best performance for ${R}_{{\rm{val}}}^{2}$ (0.886). Further improvement for the post-processing on the classical part (e.g., classical neural networks) may be necessary.

3.5. Performance comparison with conventional chemometrics

Having developed our QML models for QSTR application, we now compare their performance with those obtained by conventional chemometrics methods, namely MLR and RBF-NN methods (table 4). In our MLR model, the values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ were 0.609 and 0.740, respectively, in agreement with those in the previous work [62] (0.602 and 0.786, respectively). The values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ in the QML model using the original scheme were 0.644 and 0.825, respectively; and the average values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ obtained by our improved QML models were 0.910 and 0.878, respectively (51% and 12% higher than the MLR counterparts, respectively). Thus, the QML models performed significantly better than the MLR models, suggesting that the QML models succeeded in nonlinear regression tasks.

Table 4. Performance comparison of our QML models with those obtained from conventional chemometrics methods (the coefficients of determination, MSE, and root mean square (RMS) for the training and the validation sets).

Methods ${R}_{{\rm{train}}}^{2}$ ${R}_{{\rm{val}}}^{2}$ ${{\rm{MSE}}}_{{\rm{train}}}$ ${{\rm{MSE}}}_{{\rm{val}}}$ ${{\rm{RMS}}}_{{\rm{train}}}$ ${{\rm{RMS}}}_{{\rm{val}}}$
QML(A2–A2–CNOT-10q-m4) a 0.913 0.886 0.062 0.046 0.250 0.214
QML(A2–A2–CNOT-10q) b 0.9060.8690.0670.0520.2600.229
QML (original-5q) c 0.6440.8250.2560.0700.5060.264
RBF-NN ([62]) 0.942 0.882 0.041 0.058 0.204 0.240
RBF-NN d 0.9280.8190.0520.0720.2270.269
MLR ([62])0.6020.7860.2860.1020.5350.320
MLR d 0.6090.7400.2810.1040.5300.322

a Obtained using the A2–A2–CNOT encoder (10 qubits) combined with the hardware-heuristic variational circuit ($M=4;$ $L=10$). b Obtained using the A2–A2–CNOT encoder (10 qubits) combined with the hardware-heuristic variational circuit ($L=10$). c Obtained using the original encoder (5 qubits) combined with the variational circuit based on Ising Hamiltonian. d Calculated in the present work.

It is also important to compare the performance of our QML models with those obtained from the RBF-NN models (table 4), because RBF networks are capable of universal approximation [64]. The values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ in the previous RBF-NN model [62] are 0.942 and 0.882, respectively; and those in our RBF-NN model were 0.928 and 0.819, respectively. Hence, the performance of our QML models was comparable to those obtained by the RBF-NN models. In fact, the plots for the observed versus predicted toxicity obtained by the RBF-NN and the QML models are remarkably similar to each other; and there is also a similarity in the distributions of certain outliers (figure 6). The results indicate that our quantum-enhanced feature map generated by the PQCs was similar to the mapping obtained by the RBF network, which is capable of universal approximation. Furthermore, our hybrid QML model ($M=4$ and $L=10$) provided a slightly better ${R}_{{\rm{val}}}^{2}$ value (0.886) (see also figure C2), compared with the RBF-NN counterpart (0.882). This interpretation can be supported by the smallest ${{\rm{MSE}}}_{{\rm{val}}}$ and ${{\rm{RMS}}}_{{\rm{val}}}$ values (0.046 and 0.214, respectively) obtained by the hybrid QML model (table 4). Our results imply a high expressive power of our QML models using multilayer PQCs.

Figure 6.

Figure 6. Plots for the observed versus predicted toxicity obtained from the MLR, RBF-NN, and QML models (blue triangle: training set; red circle: validation set). (Left) MLR model (${R}_{{\rm{train}}}^{2}:$ 0.609; ${R}_{{\rm{val}}}^{2}:$ 0.740). (Center) RBF-NN model (${R}_{{\rm{train}}}^{2}:$ 0.928; ${R}_{{\rm{val}}}^{2}:$ 0.819). (Right) QML model (${R}_{{\rm{train}}}^{2}:$ 0.906; ${R}_{{\rm{val}}}^{2}:$ 0.869) using the A2–A2–CNOT encoder with 10 qubits.

Standard image High-resolution image

In order to shed light into the generalization performance of the modeling schemes, we further conducted 5-fold cross-validation using the entire data set. Note that this validation scheme is different from the hold-out validation in table 4, in which the data splitting was obtained using the Kennard–Stone algorithm. We found that RBF-NN and hybrid QML models tended to experience overfitting: for the RBF-NN method, the values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ were 0.933 and 0.619, respectively; for the hybrid QML approach, the values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ were 0.932 and 0.479, respectively. On the other hand, pure QML models (using the single qubit measurement) appeared to avoid overfitting: the values for ${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$ were 0.876 and 0.694, respectively; this is probably because the unitary conditions innately acted as regularization [31]. Considering all the results in this work, the performance comparison can be summarized as follows: our best QML model $\approx $ RBF-NN models > MLR models. Our results thus imply that the QML method could be an alternative approach for nonlinear regression tasks.

3.6. Perspectives on QML

Let us comment on several perspectives on QML models. While definitive quantum advantage for machine learning has been controversial, we anticipate that there may be several merits for employing QML. First, we can directly manipulate the feature map in terms of quantum many-body states. If one could use complex, computationally intractable quantum states as feature maps while avoiding overfitting, then that could be an advantage. Second, once the architecture of PQCs is designed, it can train QML models in an efficient way, without the need for further tuning. In particular, the unitary conditions inherent to quantum circuits can act as built-in regularization, which may result in the avoidance of overfitted models and the improvement of generalization performance. In the case of RBF-NNs, on the contrary, centers of the RBFs, the number of hidden layer units, widths, and weights have to be determined carefully. Third, QML models using PQCs require much less number of trainable parameters [36, 50] and perhaps fewer hyperparameters, implying the possibility of efficient and unbiased machine learning using near-term quantum computing. Fourth, on numerical simulators, the interpretation of QML models could be possible by analyzing the information about unitary operations and wavefunctions. Fifth, there is a close relationship between quantum circuits and tensor networks, which may be advantageous for the development of QML in the framework of tensor networks [65]. Considering all this, it is desirable to investigate the performance of QML on a variety of practical applications using real-world data sets, such as cheminformatics, materials informatics, and other practical machine learning tasks.

4. Conclusions

In the present work, we have developed our QML models designed for predicting the toxicity of 221 phenols (QSAR/QSTR modeling), using the framework of the quantum–classical hybrid algorithm. To our knowledge, this is the first practical application of QML for a nonlinear regression task using a real-world data set.

We have numerically investigated how the different encodings, the variational circuit architectures, and the redundant encodings using multiple qubits affected the performance of our QML models. In our particular application, angle encoding was found to be useful in terms of flexibility in data representation owing to its simplicity and high nonlinearity. Furthermore, the results suggest that our entangler-enhanced encodings provided more expressive power in data representation than the previous ones, implying that quantum correlation could be useful for the feature map representation of classical data. The numerical results also indicate that, in our QML models, encoding each component of the input vector into multiple qubits appeared to be helpful for improving the performance in comparison with encoding each component into single qubit. Repeated circuit layers with CNOT blocks in the variational circuit provided a computational speedup compared with the original variational circuit based on the time evolution of Ising Hamiltonian.

Our QML models performed significantly better than the MLR models (51% and 12% increases for the training and validation sets), suggesting that the QML models succeeded in nonlinear regression tasks. Moreover, our simulations indicate that our best QML models were comparable to those obtained by RBF networks, while improving the generalization performance. We have also mentioned several perspectives on QML models, from a more general standpoint. Further improvements would be needed for the developments of the encoding method and of the evaluation of the cost functions (post-processing). Exploring noisy simulations and experiments on the real quantum hardware would be important in order to improve the QML models. The present study opens up the possibility that QML could be used for various nonlinear regression tasks such as cheminformatics and other machine learning applications.

Note added. After the release of our preprint version, two independent works on data encoding were released very recently: one is the work by Schuld, Sweke, and Meyer [66] and the other by Goto, Tran, and Nakajima [67]. They investigated parallel and sequential scenarios in repeated data encodings from a mathematical point of view; our work focuses on a practical application of QML on a real-world data set, using multiple data encoding with entanglement that is suited for our particular case. Our encoding approach may also be useful, since the number of qubits and the circuit depth are still limited in the real near-term quantum hardware.

Acknowledgments

We thank Seiya Sugo for useful discussions at the early stage of this work.

Appendix A.: Numerical results using 15 qubits

 

Table A1. Coefficients of determination for the training and the validation sets (${R}_{{\rm{train}}}^{2}$ and ${R}_{{\rm{val}}}^{2}$) using 13 different encoder circuits with 15 qubits (three copies of the quantum states), the optimized number of layers in the variational circuit ${L}\,(3\leqslant L\leqslant 12)$ and the scaling factor ${f}_{{Z}}$ for the expectation value ${Z}.$

Encoder Circuit ID ${R}_{{\rm{train}}}^{2}$ ${R}_{{\rm{val}}}^{2}$ L ${f}_{Z}$
M0.7660.79376.0
M–M–CZ0.7930.808610.0
M–A1–CZ0.8430.8281010.0
M–A2–CZ0.8610.8201210.0
M–M–CNOT0.7860.814410.0
M–A1–CNOT0.855 0.839 1210.0
M–A2–CNOT0.8670.8361010.0
A10.833 0.839 48.0
A20.8320.83558.0
A1–A1–CZ0.8370.818310.0
A2–A2–CZ0.8030.832410.0
A1–A1–CNOT0.8630.834910.0
A2–A2–CNOT 0.873 0.8381010.0

Appendix B.: Performance and computational costs for the variational circuit

 

Table B1. Performance comparison of the three variational circuit units: the coefficients of determination for regression tasks.

 Coefficients of determination
Variational circuit unit ${x}^{2}$ ${e}^{x}$ $\sin \,x$ $\left|x\right|$
Time evolution of Ising Hamiltonian0.9970.9970.9970.971
Circuit unit containing a CNOT block0.9980.9990.9990.971
Circuit unit containing a CZ block0.7290.9700.9970.793

Table B2. Matrix size, computational costs, and required memory for the calculation of the Trotter operator matrix.

Number of qubits (n)Dim. of the Trotter operator matrix a Relative computational costs b Memory for the Trotter operator matrix [MB] c
53210.05
101,02432,76848
1532,7681,073,741,82449,152
201,048,57635,184,372,088,83250,331,648

a Calculated as ${2}^{{n}}.$ b Estimated as ${2}^{3{n}-3}$ (the computational cost using 5 qubits was set to 1). c Estimated as $16\,\times {2}^{2{n}}\times \,3\,/\,{1024}^{2}.$

Appendix C.: Comparison between pure and hybrid QML models

 

Figure C1.

Figure C1. Comparison between pure and hybrid QML models. Coefficients of determination for the training set (${R}_{{\rm{train}}}^{2}$) as a function of the number of unit layers $L$ in the variational circuit $U({\boldsymbol{\theta }}).$ The results were obtained using the A2–A2–CNOT encoder with 10 qubits. In the hybrid QML models, the expectation values from $M$ qubits were fed into the evaluation of the loss function. The scaling factor ${f}_{Z}$ was set to 4.0 (we found that, in the hybrid QML models, the scaling factor greater than 4.0 resulted in an unstable performance).

Standard image High-resolution image
Figure C2.

Figure C2. Plot for the observed versus predicted toxicity obtained from the hybrid QML model ($M=4;$ $L=10$) using the A2–A2–CNOT encoder with 10 qubits (blue triangle: training set; red circle: validation set) (${R}_{{\rm{train}}}^{2}:$ 0.913; ${R}_{{\rm{val}}}^{2}:$ 0.886). Note that the plot is very similar to that obtained from pure QML model (using the single qubit measurement) in figure 6.

Standard image High-resolution image
Please wait… references are loading.