Computational discovery of molecular C60 encapsulants with an evolutionary algorithm

Miklitz, Marcin; Turcani, Lukas; Greenaway, Rebecca L.; Jelfs, Kim E.

doi:10.1038/s42004-020-0255-8

Download PDF

Article
Open access
Published: 22 January 2020

Computational discovery of molecular C₆₀ encapsulants with an evolutionary algorithm

Communications Chemistry volume 3, Article number: 10 (2020) Cite this article

4782 Accesses
10 Citations
17 Altmetric
Metrics details

Subjects

Abstract

Computation is playing an increasing role in the discovery of materials, including supramolecular materials such as encapsulants. In this work, a function-led computational discovery using an evolutionary algorithm is used to find potential fullerene (C₆₀) encapsulants within the chemical space of porous organic cages. We find that the promising host cages for C₆₀ evolve over the simulations towards systems that share features such as the correct cavity size to host C₆₀, planar tri-topic aldehyde building blocks with a small number of rotational bonds, di-topic amine linkers with functionality on adjacent carbon atoms, high structural symmetry, and strong complex binding affinity towards C₆₀. The proposed cages are chemically feasible and similar to cages already present in the literature, helping to increase the likelihood of the future synthetic realisation of these predictions. The presented approach is generalisable and can be tailored to target a wide range of properties in molecular material systems.

Heat flows enrich prebiotic building blocks and enhance their reactivity

Article Open access 03 April 2024

Thomas Matreux, Paula Aikkila, … Christof B. Mast

Synthesis of goldene comprising single-atom layer gold

Article Open access 16 April 2024

Shun Kashiwaya, Yuchen Shi, … Lars Hultman

De novo design of pH-responsive self-assembling helical protein filaments

Article Open access 03 April 2024

Hao Shen, Eric M. Lynch, … David Baker

Introduction

Arguably, the majority of cases of the discovery of new materials are dependent upon small changes to known systems based on chemical knowledge or are a result of a serendipitous discovery. However, computation is playing an increasing role in the rational design and discovery of new advanced materials¹. For example, the high-throughput computational screening of existing and hypothetical compounds can facilitate identification of materials with optimal properties or help formulate structure-property relationships for future rational materials discovery^2,3. High-throughput screens can be used to perform brute force searches of a large number of possible materials, accelerated by increasing computational power or machine learning, and covering much larger regions of phase space than can be reasonably accessed experimentally, even with automation.

Porous molecular materials are distinct from porous network materials such as zeolites, metal-organic frameworks (MOFs) and polymers, in that they are made up of discrete molecular units rather than having three-dimensional extended bonding^4,5. Molecules can be porous in the solid-state through either extrinsic porosity, where the molecules are unable to pack efficiently to remove void space, or through intrinsic porosity, where the molecule itself has a persistent internal cavity. Examples of intrinsically porous molecules include calixarenes, cucubiturils and organic cages, and these systems are investigated for applications in molecular separation, encapsulation, catalysis, sensing, and as porous liquids⁴. Porous organic cages (POCs) are polycyclic molecules that have three-dimensional structures with three or more molecular windows⁴.

The discovery of new POCs consists of many challenges; first, after the successful synthesis of the required precursors for the systems, they must be combined to form the cage species, which is typically done via a one-pot reaction using dynamic covalent chemistry (DCC). The most common type of DCC reaction used to form cages is imine condensation. During this process, not every reaction will successfully form a cage, for example, in a recent high-throughput screening study only 42% of the reactions were successful⁶. Furthermore, not only does one need a successful reaction, but the reaction needs to form the molecule in the desired topology and for the molecule to be shape persistent if desired, retaining an internal cavity in the absence of solvent. The topology formed can be hard to predict a priori⁷, and, further, we recently found that of 63,472 hypothetical cages, built from a library of precursors with shape persistence in mind, only 28% were actually shape persistent⁸.

Computation can help guide the discovery of POCs, with calculations considering the thermodynamics and kinetic pathways of the assembly process able to help identify the expected topology of a given reaction^6,9,10, and whether or not it is shape persistent^8,11. We have also recently applied supervised machine learning to accelerate the prediction of shape persistence of a hypothetical cage assembly, making this accessible to the experimental community⁸. With a molecular structure, crystal structure prediction techniques can be used to unveil the most energetically favourable crystal packings¹². While <200 POCs have so far been experimentally realised, in theory the search space for these systems is vast if all possible combinations of organic precursors for DCC reactions are considered. Of course, not all molecules are suitable building blocks for POCs, nor are the majority likely to be synthetically accessible, but this just creates an additional challenge in the sensible selection of precursors if one wants to truly consider a diverse range of possibilities, outside of what would be immediately available for synthetic screening.

It is not computationally feasible to analyse all combinations of organic building blocks as POCs for a given application. Recently, we developed open-source python-based software, called the supramolecular toolkit (stk), that allows the automated construction of different types of materials from precursor databases¹³. We recently showed that an extension of stk to include an evolutionary algorithm (EA) could be used to target specific structural features of POCs, such as high symmetry or a specific pore size, identifying not only promising targets, but also more general design rules to obtain a specific feature¹⁴. This has already led to the synthetic realisation of promising identified POCs¹⁵. EAs mimic evolutionary processes to solve global minimisation problems, with the evolutionary pressure for ‘survival of the fittest’ in our case being targeted towards a desired set of features in a molecular material. After calculating the quality of each of the candidates of a generation, the population is ‘evolved’ by performing modifications that mimic crossover and mutation in nature. EAs are used as efficient ways to sample chemical space for drug discovery¹⁶, and computational materials discovery¹⁷, including for porous network materials¹⁸. Here, rather than focusing on optimising a structural feature of the POCs, we focus for the first time on targeting a specific application of the cages, in this case the encapsulation of C₆₀ within a cage when in solution. Through screening hundreds of possibilities and seeking to optimise the function of the cage, this differs to an approach for designing metal-organic cages to encapsulate materials by designing complementary geometries of the host¹⁹.

The application of fullerenes span over biomedicine²⁰ and materials science²¹, for example in organic photovoltaic devices and superconductive materials^22,23. A lot of effort has been applied to research into the selective binding of different species of fullerenes for the purification process^24,25. The immobilisation of fullerenes in complexes enables controlled property tuning and selective formation of fullerene adducts^26,27. Fullerenes can also act as templates and drive macrocycle formation towards desired supramolecular architectures²⁸. The common mechanism of fullerene encapsulation is to maximise the non-specific van der Waals interactions between the host molecule that “wraps” itself around the fullerene, as in the “buckycatcher”²⁹. There are multiple examples of bowl-shaped molecules binding with fullerene^30,31,32, and some examples of metal-organic cage encapsulation^24,33. POCs, however, have been mostly absent in fullerene host-guest supramolecular chemistry. The only two examples that have been proposed as possible fullerene hosts to our knowledge are a sandwich-like cage and a porphyrin cage (COP-5)^34,35.

Here, our EA-based screening for POCs that are potential C₆₀ encapsulants reveals specific cage targets that have common features such as a cavity diameter of ~10 Å and similar sized building blocks. We explore how to parameterise the EA and discuss how the approach could be applied to larger databases of potential cage building blocks, or targeted at other encapsulants or molecular materials with desired properties in the future.

Results

The database of assembled cages

A small custom database of precursors, 43 tri-topic (Tri) aldehydes and 90 di-topic (Di) amines (see Supplementary Figs. 1–4), was used to reduce the vast chemical space of possible precursors. This precursor database, when combined in every possible combination in a single topology, corresponds to 3870 imine cages. Here, we only consider cages assembled in a [4 + 6] reaction of four aldehydes and six diamines into a Tri⁴Di⁶ topology that relates to a tetrahedron, using the nomenclature introduced by Santolini et al.⁹. The trialdehydes are hereafter referred to as ‘nodes’ and diamines as ‘linkers’, based on their positioning on the template geometry in the cage assembly process (trialdehydes on the vertices and diamines on the edges). These precursors were either selected from previously reported organic cages, or are molecules that we deemed synthetically viable and reasonable precursors for cage synthesis, but have not been previously reported. The same set of precursors was used in our previous study using machine learning to predict shape persistence⁸. This database is intentionally limited in size to allow quick screening for the purpose of the fitness function (FF) parameterisation. To simplify the description of the derived POCs and C₆₀@POC complexes, a generated POC is simply referenced to as the “cage” and the corresponding C₆₀@POC complex as the “complex”. The final population of cages were assigned code names of type CX, where X is a number in ascending order and C1 is a cage of highest fitness value. Lastly, “CX complex” corresponds to the C₆₀@CX complex.

An overview of the assembly and property calculation process for a cage is shown in Fig. 1a. The calculated properties of the geometry optimised cages and their corresponding complexes are shown in Fig. 2 and Supplementary Figs. 5–8. The pairs of POCs and their complexes were divided into three groups: the complexes that have C₆₀ binding energies greater than 0 kJ mol⁻¹ (repulsive interaction); complexes with binding energies within the range of −404 and 0 kJ mol⁻¹; and complexes with binding energies well below −404 kJ mol⁻¹. The last set, coloured red in the graphs, was notable as these forcefield binding energies seemed unreasonable. Grimme et al. reported binding energies of −770 and −606 kJ mol⁻¹ for a hypothetical multi-shell “hyperfullerene” complex (C₆₀@C₂₄₀), where these values can be seen as a physical limit of the C₆₀ interactions with a potential host³⁶. We note that these simulated binding energies will be of considerably greater magnitude than any experimentally measured values due to the absence of solvent in our simulations. However, in the set of complexes coloured red, the binding energies are in the range of a few thousands of −kJ mol⁻¹. Additionally, these seem to aggregate around certain values and are observed for POCs containing larger cavities, in the region of 20–30 Å (the C₆₀ diameter is ~10 Å). These were inspected and determined to have unreasonable geometries with the POC structures ‘stuck’ in chemically infeasible geometries, for example with an unusual orientation of hydrogens. It is believed this is a systematic error (as there is a similarity of binding energy values between groups of cages) and a false result, thus these POCs were disregarded.

**Fig. 1: An overview of the computational workflow.**

**Fig. 2: The properties of the assembled POCs and C₆₀ complexes.**

Those complexes with attractive binding energies between 0 and −404 kJ mol⁻¹ in Fig. 2 have a particular focus around cavity sizes of ~10 Å, more so in fact than in the isolated POC molecules. This is the approximate size of the C₆₀ molecule and reflects the fact that many of the POCs have expanded their intrinsic cavity to form one of the correct size for hosting C₆₀. This is the reason for a large set of complexes with positive binding energies (blue points); the energy penalty of adapting to fit C₆₀ is far greater than the benefits of the C₆₀ presence. The green set of cages with favourable binding energies is the target group of complexes for the EA.

FF parameterisation

The FF is used to calculate the performance of a cage as a C₆₀ encapsulant during a run in our EA. The FF parameterisation was first performed on the database of all assembled cages and their complexes. For the purpose of the FF parameterisation, the complete database of 3870 cages and their C₆₀ complexes were generated and the C₆₀ binding energy in the complex at the forcefield level (E_binding) and the asymmetry of the cage extracted from the complex (A_complex) were calculated. The geometry optimisation process is the bottleneck of the EA calculations in this work, and the database of pre-assembled and geometry optimised cages resulting from all combinations of precursors allowed for a quick screening of a range of constants and powers for the FF to find the right parameters. The FF had the form:

$${\mathrm{FF}}={(a{E}_{\mathrm{binding}}^{b}+c{A}_{\mathrm{complex}}^{d})}^{-1}$$

(1)

and a and c constants were screened for values between 1 and 5, in increments of 1, for all combinations. The b and d exponents for all combinations of values in the range of 0–5 were considered in increments of 0.25.

During the FF calculation, collapsed cages that have lost their internal cavity are discarded, allowing us to focus on shape-persistent, symmetric molecules as potential C₆₀ encapsulants. It is facile to identify these systems, as these will not have windows whose diameter can be determined by pywindow. We found that 37% of the POCs were discarded because either the empty cage, complex, or both, failed the asymmetry criteria, with the majority of collapsed cases coming from the empty systems (21%). If we then factor in wanting to have a favourable interaction energy, then 44% of cages fit that criteria. For all 2432 of these cages, the FF was calculated with the full set of parameters, equating to 9261 different setups, 21 different ratios of constants a and c and for each of these, a heat map was generated with 441 combinations of exponents b and d. For each a, b, c, d parameter combination, the R₁₀ score was calculated with Eq. (4) for the set of ten cages with the highest fitness value. The R₁₀ score gives the relative quality of the set of ten best cages in respect to other sets of parameters for the FF. The results presented in Fig. 3 are for the set of constants a:c for 1:1 (middle), 5:1 (left) and 1:5 (right) ratios, which show the general trend observed for combinations of a and c. The lowest R₁₀ score corresponds to the optimal set of parameters. The lowest R₁₀ value was 1.480 and was identified for 126 different sets of parameters. The simplest set of parameters, where the sum of a, c, b and d was smallest, was then considered. The identified set of parameters was a = 1, b = 3.25, c = 1 and d = 4.25.

**Fig. 3: The heat maps of the fitness function parameterisation process.**

We can learn lessons from this parameterisation procedure that can be used in other studies in the future that seek an optimal set of weightings for an EA FF to search large databases for molecular materials. Firstly, a rigorous approach would involve generating a random subset of all potential solutions, and then a parameterisation performed as we have done here, before applying that parameterisation to a search of the full database. However, we can also see that if you already have components in the FF, then the setup can be generalised to add additional related components, for example cage and complex asymmetry here. What we learnt from our extensive parameterisation, was that in the end the weightings of the components essentially matched what we would have expected from chemical intuition. Thus, for a system where there is familiarity with the importance of the components, the parameterisation step could be skipped. Finally, while the exponent values of 3.25 and 4.25 were found to give the best rankings in this case, we would suggest that it would also be sufficient, and simpler, to use exponents of 1 in future studies.

The evolutionary algorithm calculations

A flowchart summarising the key steps in the EA is shown in Fig. 1b. With the FF parameterised, five separate EA calculations were performed on the database of precursors presented. The final goal was to find excellent POC candidates for C₆₀ encapsulation and the FF had the final form:

$${\mathrm{FF}}={({E}_{\mathrm{binding}}^{3.25}+0.5{A}_{\mathrm{complex}}^{4.25}+0.5{A}_{\mathrm{cage}}^{4.25})}^{-1}$$

(2)

At this point we introduced the new feature of A_cage, the asymmetry of the isolated cage. While we simplified the initial parameterisation to only have two components (E_binding and A_complex) to make it manageable, we added this additional feature here as discussion with synthetic chemists had suggested that higher symmetry isolated cage molecules should have a higher likelihood of being synthesised. To weight the binding energy equally to the asymmetry consideration, the asymmetry-related parameters were given half weights (constants of 0.5), so that the sum of the constants equals that of the binding energy. Each EA calculation was run for 100 generations, with a population size of 20.

The evolution of the FF in each run is shown in Fig. 4. We can see that in all cases, the mean fitness value quickly increases and then converges. In most cases, convergence occurs relatively quickly, after ~25 generations. Supplementary Fig. 9 shows the breakdown of the absolute values of the three components of the FF; the binding energy, complex asymmetry and cage asymmetry. These show that although the binding energy converges quickly, with essentially no change after 20 generations, the asymmetry values fluctuate more, with an overall trend to lower mean values for the asymmetry (i.e. more symmetric structures), which is generally converged by about 50 generations. This suggests that finding high binding energy complexes is easier than finding symmetric cages and assemblies. As more symmetric cage systems stand a greater likelihood of being synthetically realised, it is important to use the longer runs to fine tune these features. These findings emphasise a common feature in computational materials discovery programs—that it is comparatively easy to find materials with good property performance, but harder for the materials to also be experimentally viable.

**Fig. 4: The evolution of the fitness values in the five EA calculations.**

To rank the cages from all five EA calculations, the results were combined and reweighted with respect to the same FF from Eq. (2). The combined results consisted of 53 unique cages (duplicates were discarded) and their complexes. Figure 5 shows how the top scoring cage evolves over the generations for run 1 and Supplementary Figs. 10–13 show the same for the other runs. While each run is different, and obviously seeing only the top candidate only provides so much information, it can be seen that the cages typically start with a cavity that is too small or too large for C₆₀, alternative sizes are then trialled, but once there is a top candidate with approximately the correct size for C₆₀, the cavity size of the top candidate essentially no longer changes, but rather there are only changes to the exact chemical composition of the components of the cage, as the EA seeks to maximise the FF. We note that the top couple of cages can swap, with a specific candidate no longer being ranked top before returning to top; this is due to the precise ranking depending on the composition of the entire population in our normalisation process.

**Fig. 5: The evolution of a high performing C₆₀ cage encapsulant.**

To examine how chemically diverse the building blocks of the cages were, and how this evolved over the course of the EA, we calculated the mean Dice similarity of Morgan fingerprints of radius 2 between all unique pairs within each generation. As shown in Supplementary Fig. 19, the mean Dice similarity across random building blocks at initialisation is approximately 0.34. In all runs, the mean Dice similarity increases to between 0.4 and 0.5 over the course of the run. This makes sense as, for example, some of the building blocks that are too large or too small to form the correct size pore are not selected, resulting in populations that occupy a smaller region of chemical space as the EA continues. However, as features such as external functionalisation of the cage are not under evolutionary pressure, there could be significant differences in those regions of the cage building blocks. The (small) range of different values across the five runs also indicates that different final populations are found, even if many of the top candidates are the same.

We further carried out a structural analysis of the cages over each of the EA runs, calculating the average percentage of double bonds and rotatable bonds in the cages at each generation (see Supplementary Figs. 14–18). We found that the runs typically converge to cages having an average of 5–15% of their bonds being classed as rotatable, with the linker typically having a greater degree of rotatable bonds, and just below 10% double bonds in the molecule for the linker and almost 30% double bonds for the node. These features can be considered as design rules for molecules that encapsulate C₆₀. To aid analysis of convergence in the future, tools which identify the salient features of building blocks with regard to the pore, and compare those only, would provide a more accurate picture of convergence.

The 20 cages with the best fitness values are presented in Fig. 6. In Fig. 7, the nodes and linkers that the 20 cages were assembled from are listed. In addition, each EA run that identified a given cage is marked with a tick sign. The fact that many of these cages were identified multiple times shows the effectiveness of the constructed FF and that the screening of the databases is quick and broad.

**Fig. 6: The 20 best cages found for C₆₀ encapsulation.**

**Fig. 7: The aldehyde nodes and amine linkers used to assemble the final population of cages.**

The nodes in the 20 best cages share similar features. They are planar, have a small number of rotatable bonds, and have a high number of aromatic rings. They are also very similar in size. The node38 has a spherical diameter of 16.1 Å, and node15 and node13 have diameters of 16.8 and 17.0 Å, whereas node17 and node16 are slightly smaller and have diameters of 14.1 and 14.4 Å, respectively. The linker38 in C1 is the most distinct from the set of linkers, as the separation of the nitrogens between neighbouring imine bonds is 6.9 Å. All the other linkers have the amine functionality on neighbouring carbons, resulting in the spacing between nitrogens in imine bond pairs in a range of 3.0–3.4 Å. While the linker in C1 is comparatively larger, this does not result in a larger cavity diameter in comparison with the rest of the cages.

In Table 1, the re-scaled fitness values for the combined results of the five EA calculations and the unscaled parameters are presented. The cages have relatively high magnitude binding energies, between −160 and −270 kJ mol⁻¹. The asymmetry for both the empty cage and the cage complex are also in the lower range of values present in the database, so the final assemblies and their corresponding POCs are all very symmetrical. The POCs have cavity diameters between 9.7 and 10.6 Å, all close in size to the C₆₀ diameter (~10 Å). This is despite the fact that the cavity diameter of the POC and of the POC in the complex were not part of the FF. This shows how the binding energy was a good choice for a parameter that would also affect other features such as the cavity size.

Table 1 The fitness values and properties of the final cages.

Full size table

In Fig. 8 and Supplementary Figs. 20–23, we show where the top results are located in terms of properties relative to the entire database. The identified solutions are highly localised, especially for the features that were part of the FF. This is somewhat equivalent to finding the global minimum on the chemical hyperspace, although here we do not aim at a global minimum, rather finding good solutions for POC C₆₀ encapsulants. The results are especially promising as some of the building blocks that repeatedly occur in the top candidates have been previously used to synthesise cages. For example, Ding et al. synthesised a [4 + 6] triazine cage with cyclohexylediamine in 2015³⁷; the triazine building block used in this example is similar to node15 that occurs in POCs C3, C4, C5 and C14, differing only in the number of nitrogen substitutions in the central heteroatomic benzene ring. Further, node16 in C17 was previously used to synthesise a [4 + 6] POC called CC5 when combined with cyclopentyldiamine³⁸, and in our prediction, linker33 is a substituted cyclopentyldiamine. Most recently, truxene building blocks, structurally similar to node38 in C1 and nine other cages, have been used to synthesise [4 + 6] POCs with ethylenediamine³⁹ and cyclohexyldiamine⁴⁰. However, it has been reported that a [2 + 3] capsule is actually formed with cyclohexyldiamine when using a truxene containing the same trialdehyde substitution pattern as in C1 and the other examples⁴⁰, rather than the targeted [4 + 6] cages here. Although this does not mean that the [4 + 6] complexes would not necessarily be formed in the presence of C₆₀ instead if a templating effect was to occur, rather than relying on diffusion of the C₆₀ into a pre-formed cage cavity.

**Fig. 8: The position of the top candidates in the property space explored.**

Design principles for POC encapsulants of C₆₀

In addition to the set of specific target cages for POC encapsulation of C₆₀ and the development of a FF that can be applied to search much larger databases of building blocks, we can identify the ideal features of any POC for that task. Firstly, the optimal cavity size is in the range of 9.3–11.6 Å. If considering a [4 + 6] imine cage, the tri-topic aldehyde should be roughly ~16 Å in diameter, and the di-topic amine should have the amine functionality on neighbouring carbon atoms. Many of the precursors used here could be simplified towards alternatives that were successfully used to synthesise cages in the past. However, the best molecule, C1, has the amine functionality in greater separation (not on neighbouring carbon atoms), thus larger aldehydes should be considered for combination with this diamine. To our knowledge, there are currently no studies in the literature of C₆₀ encapsulation in [4 + 6] imine cages. However, the experimental examples of cages, listed in the previous section, are structurally similar to the presented set of cages here.

Discussion

We have shown a computational approach using our developed evolutionary algorithm for the discovery of POCs as potential C₆₀ fullerene encapsulants. The whole process from the choice of the database of precursors, FF construction, the assembly of a database for parameterisation, and the analysis of the results provided insights into each of these steps. The presented methodology can be used in place of experimental serendipitous discovery, or for the opposite, to facilitate and improve rational design of new functional materials by providing insightful structure-property relationships. The EA and the constructed FF were found to efficiently identify promising candidates for experimental consideration to find new C₆₀ encapsulants. More importantly, the same setup could be used for larger databases of building blocks, for any encapsulations, and the discussed parameterisation approaches conducted for extension to other properties and/or molecular material systems.

Design principles can be formulated from the results. The aldehyde building blocks should be fairly planar, with a circular diameter in the range of 14.1–17.0 Å. The amine linkers with amine functional groups on the neighbouring carbons result in the most promising cages. However, larger linkers such as 1,10-phenanthroline-2,9-diamine, which is in our top candidate (C1), should also be considered. In all cases, the building blocks have a small number of rotatable bonds, and a high number of aromatic rings. The combination of the building block and linker should result in a cavity size of ~10 Å in diameter.

If we consider a hypothetical database of 30,000 di-topic linkers and 10,000 tri-topic nodes, and if we extend the possible topologies ([2 + 3], [4 + 6], [8 + 12]), the resulting combination of all possibilities would reach 900 million imine cages. Extending our dynamic covalent reaction chemistries to include reactions beyond imine condensation, and to precursors with different numbers of reactive end groups, would quickly result in billions of possibilities. The use of an effectively parameterised EA, as we have presented here based on a parameterisation using just 3870 cages, a tiny fraction of the potential search space, to effectively explore this search space is therefore necessary, as it is not possible to conduct a brute force search of billions of possible POCs. Our approach can also be easily modified to target other properties of molecular materials, such as the likelihood of guest diffusion through a pore window, the size and shape of the host molecule, but also other properties, such as optoelectronic properties in organic electronics.

Methods

Cage assembly

The POCs were assembled with our stk software by placing the nodes on the vertices and linkers on the edges of a template tetrahedral geometry¹³. Through the selection of high symmetry precursors and this symmetrical topology, we are targeting symmetrical assemblies, which can be anticipated to help increase the chance of synthetic realisation and simplify the number of structural possibilities. The assembly process of the related C₆₀ complex uses a new function in stk, where the C₆₀ is placed at the centre of the template tetrahedral geometry at the very beginning of the assembly process before any geometry optimisation of the POCs. The following procedure for finding the lowest energy POC conformer was performed on the empty cages and their complexes using the OPLS3 force field⁴¹ in Schrödinger LLC’s MacroModel (Release 2016-2). We have previously found that OPLS3 reproduces well the structure and energetics of porous imine cages^7,9. Firstly, a geometry optimisation was performed with a convergence criterion of a gradient change smaller than 0.05 with all bonds, apart from those created during the assembly step (imine bonds), restricted during the geometry optimisation. This is followed by a Molecular Dynamics (MD) run at 700 K and timestep of 1 fs to explore the conformational landscape for the molecule. A 10 ps equilibration is followed by a 200 ps production run that is sampled every 10 ps, with each sampled structure being fully geometry optimised. The configuration with the lowest energy at this stage is selected and evaluated with the FF.

FF parameterisation

The cavity diameter (D) and the window diameters used to calculate the asymmetry of a cage (A_complex) and its complex (A_cage) were obtained with pywindow (implemented as a part of the stk software). The asymmetry (A) is defined as the difference between all the window diameters in a cage. First, the window diameters are calculated. Then, the asymmetry is calculated as a sum of the differences in all window diameters. The more comparable the window diameters are, the lower the asymmetry of a cage. We have previously found low asymmetry scores to be a good indication of the high structural symmetry observed in shape-persistent and non-collapsed cages of a tetrahedral topology, built from high symmetry precursors¹⁴. Tri-topic nodes that are usually at least C3_v symmetry are likely to form highly symmetric assemblies when connected into a tetrahedral topology, unless the assembly is strained. Avoiding highly strained assemblies should ideally increase the likelihood that the hypothetical cages predicted can be realised in the laboratory. The binding energy is calculated with the formula:

$${E}_{\mathrm{binding}}={E}_{\mathrm{complex}}-{E}_{\mathrm{cage}}-{E}_{{\mathrm{C}}_{60}}$$

(3)

where the total energy (${E}_{{\mathrm{C}}_{60}}$) of an isolated C₆₀ molecule is obtained through finding their lowest energy conformers.

In Eq. (1), E_binding and A_complex have their values normalised to ensure that all the parameters are positive, as for example the E_binding can be positive or negative. For each parameter, the lowest value in the population is found and then this value is added to this parameter of the entire population, ensuring all the values are greater than zero. Then, all the values are normalised by dividing them with the mean value of a given parameter within the population. Each parameter can then be multiplied by a constant or raised to some power. The final FF is the sum of all the parameters raised to the power of −1.

In the EA calculation, the FF is being minimised, so a set of 10 candidates with the lowest fitness value was taken for each parameter set up and the solutions rated. A total rating of the top 10 candidates (R₁₀) was calculated with a new equation based on the sum of unscaled properties for E_binding and A_complex for each member (i) of the set:

$${R}_{10}=\sum _{i=1}^{10}\frac{R{({E}_{\mathrm{binding}})}_{i}+R{({A}_{\mathrm{complex}})}_{i}}{2}\qquad$$

(4)

where $R{({E}_{\mathrm{binding}})}_{i}$ for ith complex was calculated using the following formula:

$$R{({E}_{\mathrm{binding}})}_{i}=\left\{\begin{array}{ll}1,&\,\text{if}\,\ {E}_{\mathrm{binding}} \, > \, {0}\ \mathrm{kJ}\ {\mathrm{mol}}^{-1}\\ 1,& \, {\text{if}} \,\ {E}_{\mathrm{binding}} \, < \, {-404}\ \mathrm{kJ}\ {\mathrm{mol}}^{-1}\\ 1-\frac{{E}_{\mathrm{binding}}}{-404\ \ \mathrm{{mol}}^{-1}},&\, {\text{otherwise}}\end{array}\right.$$

(5)

A positive binding energy, i.e., a lack of binding affinity, is penalised by adding 1 to the R₁₀ value. At the same time, binding energies lower than −404 kJ mol⁻¹ also result in a penalty of 1. The reason for this is explained in the analysis of the results of the assembled cages and relates to the fact that some forcefield binding energies are unreasonable. The strongest binding energy among the assembled complexes, with the exception of the unreasonable values, was calculated to be ~−404 kJ mol⁻¹. Therefore, the complexes with E_binding between −404 and 0 kJ mol⁻¹ are assigned a value from 0 to 1, depending on how strong the binding affinity is, resulting in a decreasing penalty for binding energies up to −404 kJ mol⁻¹. In our case here, because we had pre-run our small database, we knew that −404 kJ mol⁻¹ was the lower limit on acceptable binding energies. When moving this study to a larger search space, with unknown binding energies, we would suggest a lower limit of −780 kJ mol⁻¹, which is just below the theoretical limit for a C₆₀ complex binding energy reported by Grimme et al.³⁶.

The $R{({A}_{\mathrm{complex}})}_{i}$ is calculated with the following formula:

$$R{({A}_{\mathrm{complex}})}_{i}=\frac{{A}_{\mathrm{complex}}}{11.864 \AA}$$

(6)

where the lower the asymmetry of the cage in the complex, the lower the penalty. The asymmetry parameter is treated here as a proxy for low-strained structures that are more chemically feasible. The value of 11.864 is the highest (worst) asymmetry value in the whole database of 3870 cages. The results were then used to generate 2D heat-maps that allowed us to find the parameters a, b, c and d that yield the set of ten best candidates out of the population most effectively.

Evolutionary algorithm calculations

Overall, the implementation, for example of initialisation, mutation and crossover is as described in our previous work¹⁴. The selection function used to choose members for the next generation was a roulette wheel, where the probability of selecting a member is proportional to its fitness value. The EA steps are as follows:

1.
First the initial population of 20 diverse cages is generated. Random nodes and linkers are chosen and the cages and the corresponding C₆₀ complexes generated. This way a set of 20 random cages is generated. In each of the 5 EA runs, a different random initial population was generated.
2.
The crossover operation is then applied to a random pair of cages, exchanging building blocks between the pair to result in two offspring molecules. The crossover operation was performed 7 times in each generation.
3.
The mutation operation is applied 10 times in each generation, with the fittest population member always undergoing a mutation. The remaining 9 mutation candidates were chosen using roulette wheel probability. A cage is chosen at random and then its’ fitness is compared to a randomly generated value between 0 and 1. If the fitness of the candidate is greater than the randomly generated number, then the cage undergoes one of the mutation functions. This is repeated until 9 cages are mutated. There were four mutations applied with equal probability; exchange of the linker to a similar one (the linker with the closest Dice similarity to that being exchanged), the exchange of the node to a similar one (the node with the closest Dice similarity to that being exchanged), the exchange of the linker to a random one, and the exchange of the node to a random one. This provides an excellent balance between small and large steps across the chemical search space¹⁴.
4.
This results in a total of 44 cages, 20 coming from the current generation, 14 from crossover and 10 from mutation. From these, 20 are chosen using the roulette wheel, to create the next generation. The fittest candidate always proceeds to the next generation unchanged, equivalent to elitism.
5.
The whole process is repeated for 100 generations.

The five EA calculations resulted in five final populations of 20 candidates each. These were then combined into a single population and the duplicates were removed. The resulting population consisted of 53 unique members. The fitness of the members of the final population was re-evaluated with the FF from Eq. (2) and the candidates ranked in ascending order.

Data availability

Datasets analysed are available at https://doi.org/10.14469/hpc/6054 and any further data is available on reasonable request from the corresponding author.

Code availability

The software used here is a development of stk, which is available on github at github.com/JelfsMaterialsGroup/stk. For additional features of the software, contact the corresponding author.

References

Brédas, J.-L., Persson, K. & Seshadri, R. Computational design of functional materials. Chem. Mater. 29, 2399–2401 (2017).
Article CAS Google Scholar
Tabor, D. P. et al. Accelerating the discovery of materials for clean energy in the era of smart automation. Nat. Rev. Mater. 3, 5–20 (2018).
Article CAS Google Scholar
Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional theory. Nat. Rev. Mater. 1, 15004–13 (2016).
Article CAS Google Scholar
Hasell, T. & Cooper, A. I. Porous organic cages: soluble, modular and molecular pores. Nature Rev. Mater. 1, 16053 (2016).
Article CAS Google Scholar
Mastalerz, M. Porous shape-persistent organic cage compounds of different size, geometry, and function. Acc. Chem. Res. 51, 2411–2422 (2018).
Article CAS PubMed Google Scholar
Greenaway, R. L. et al. High-throughput discovery of organic cages and catenanes using computational screening fused with robotic synthesis. Nat. Commun. 9, 2849 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jelfs, K. E. et al. Large self-assembled chiral organic cages: synthesis, structure, and shape persistence. Angew. Chem. Int. Ed. 50, 10653–10656 (2018).
Article CAS Google Scholar
Turcani, L., Greenaway, R. L. & Jelfs, K. E. Machine learning for organic cage property prediction. Chem. Mater. 31, 714–727 (2019).
Article CAS Google Scholar
Santolini, V., Miklitz, M., Berardo, E. & Jelfs, K. E. Topological landscapes of porous organic cages. Nanoscale 9, 5280–5298 (2017).
Article CAS PubMed Google Scholar
Zhu, G. et al. Formation mechanisms and defect engineering of imine-based porous organic cages. Chem. Mater. 30, 262–272 (2018).
Article CAS Google Scholar
Santolini, V., Tribello, G. A. & Jelfs, K. E. Predicting solvent effects on the structure of porous organic molecules. Chem. Commun. 51, 15542–15545 (2015).
Article CAS Google Scholar
Day, G. M. & Cooper, A. I. Energy-structure-function maps: cartography for materials discovery. Adv. Mater. 36, 1704944 (2017).
Google Scholar
Turcani, L., Berardo, E. & Jelfs, K. E. stk: A python toolkit for supramolecular assembly. J. Comp. Chem. 39, 1931–1942 (2018).
Article CAS Google Scholar
Berardo, E., Turcani, L., Miklitz, M. & Jelfs, K. E. An evolutionary algorithm for the discovery of porous organic cages. Chem. Sci. 9, 8513–8527 (2018).
Article CAS PubMed PubMed Central Google Scholar
Berardo, E. et al. Computationally-inspired discovery of an unsymmetrical porous organic cage. Nanoscale 10, 22381–22388 (2018).
Article CAS PubMed Google Scholar
Jensen, J. Graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem. Sci. 10, 3567–3572 (2018).
Article Google Scholar
Jennings, P. C., Lysgaard, S., Hummelshøj, J. S., Vegge, T. & Bligaard, T. Genetic algorithms for computational materials discovery accelerated by machine learning. npj Comput. Mater. 5, 46 (2019).
Article Google Scholar
Chung, Y. G. et al. In silico discovery of metal-organic frameworks for precombustion CO₂ capture using a genetic algorithm. Sci. Adv. 2, e1600909 (2016).
Article PubMed PubMed Central CAS Google Scholar
McCann, B. W. et al. Computer-aided molecular design of bis-phosphine oxide lanthanide extractants. Inorg. Chem. 55, 5787–5803 (2016).
Article CAS PubMed Google Scholar
Biju, V. Chemical modifications and bioconjugate reactions of nanomaterials for sensing, imaging, drug delivery and therapy. Chem. Soc. Rev. 43, 744–764 (2014).
Article CAS PubMed Google Scholar
Rodríguez-Fortea, A., Alegret, N. & Poblet, J. M. Endohedral fullerenes. Chem. Rev. 9, 907–924 (2013).
Google Scholar
Liu, T. & Troisi, A. What makes fullerene acceptors special as electron acceptors in organic solar cells and how to replace them. Adv. Mater. 25, 1038–1041 (2013).
Article CAS PubMed Google Scholar
Ganin, A. Y. et al. Polymorphism control of superconductivity and magnetism in Cs₃ C₆₀ close to the Mott transition. Nature 466, 221–225 (2010).
Article CAS PubMed Google Scholar
García-Simón, C. et al. Sponge-like molecular cage for purification of fullerenes. Nat. Commun. 5, 5557 (2014).
Article PubMed CAS Google Scholar
Shi, Y. et al. Selective extraction of C₇₀ by a tetragonal prismatic porphyrin cage. J. Am. Chem. Soc. 140, 13835–13842 (2018).
Article CAS PubMed Google Scholar
Rizzuto, F. J., Wood, D. M., Ronson, T. K. & Nitschke, J. R. Tuning the redox properties of fullerene clusters within a metal-organic capsule. J. Am. Chem. Soc. 139, 11008–11011 (2017).
Article CAS PubMed Google Scholar
Brenner, W., Ronson, T. K. & Nitschke, J. R. Separation and selective formation of fullerene adducts within an M${}_{8}^{II}$ L₆ cage. J. Am. Chem. Soc. 139, 75–78 (2017).
Article CAS PubMed Google Scholar
Mulholland, A. R., Woodward, C. P. & Langford, S. J. Fullerene-templated synthesis of a cyclic porphyrin trimer using olefin metathesis. Chem. Commun. 47, 1494–1496 (2011).
Article CAS Google Scholar
Sygula, A., Fronczek, F. R., Sygula, R., Rabideau, P. W. & Olmstead, M. M. A double concave hydrocarbon buckycatcher. J. Am. Chem. Soc. 129, 3842–3843 (2007).
Article CAS PubMed Google Scholar
Haino, T., Yanase, M., Fukunaga, C. & Fukazawa, Y. Fullerene encapsulation with calix[5]arenes. Tetrahedron 62, 2025–2035 (2006).
Article CAS Google Scholar
Wang, L.-X., Zhao, L., Wang, D.-X. & Wang, M.-X. Synthesis of 1,3,5-alternate azacalix[3]pyridine[3]pyrimidine and its complexation with fullerenes via multiple π /π and CH/π interactions. Chem. Commun. 47, 9690–9692 (2011).
Article CAS Google Scholar
Ikemoto, K., Kobayashi, R., Sato, S. & Isobe, H. Entropy-driven ball-in-bowl assembly of fullerene and geodesic phenylene bowl. Org. Lett. 19, 2362–2365 (2017).
Article CAS PubMed Google Scholar
García-Simón, C. et al. Size-selective encapsulation of C₆₀ and C₆₀-derivatives within an adaptable naphthalene-based tetragonal prismatic supramolecular nanocapsule. Chem. Commun. 55, 798–801 (2019).
Article Google Scholar
Wang, Q. et al. A tetrameric cage with D_2h symmetry through alkyne metathesis. Angew. Chem. Int. Ed. 53, 10663–10667 (2014).
Article CAS Google Scholar
Zhang, C., Wang, Q., Long, H. & Zhang, W. A highly C₇₀ selective shape-persistent rectangular prism constructed through one-step alkyne metathesis. J. Am. Chem. Soc. 133, 20995–21001 (2011).
Article CAS PubMed Google Scholar
Grimme, S., Mück-Lichtenfeld, C. & Antony, J. Noncovalent interactions between graphene sheets and in multishell (hyper)fullerenes. J. Phys. Chem. C 111, 11199–11207 (2007).
Article CAS Google Scholar
Ding, H. Targeted synthesis of a large triazine-based [4.6] organic molecular cage: structure, porosity and gas separation. Chem. Commun. 51, 1976–1979 (2015).
Article CAS Google Scholar
Jones, J. T. A. Modular and predictable assembly of porous organic molecular crystals. Nature 474, 367–371 (2011).
Article CAS PubMed Google Scholar
Wang, Y. Elucidation of the origin of chiral amplification in discrete molecular polyhedra. Nat. Commun. 9, 488–496 (2018).
Article PubMed PubMed Central CAS Google Scholar
Zhang, P. et al. Chiral separation and characterization of triazatruxene-based face-rotating polyhedra: the role of non-covalent facial interactions. Chem. Commun. 54, 4685–4688 (2018).
Article CAS Google Scholar
Harder, E. et al. OPLS3: a force field providing broad coverage of drug-like small molecules and proteins. J. Chem. Theory Comput. 12, 281–296 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We acknowledge a Royal Society University Research Fellowship (K.E.J.), the EPSRC (EP/M017257/1, EP/R005710/1, EP/P005543/1 and EP/N004884/1) and ERC through grant agreement number 758370 (ERC-StG-PE5-CoMMaD) for funding and ARCHER time through the Materials Chemistry Consortium (EP/L000202). We thank Dr. Enrico Berardo for useful discussions.

Author information

Authors and Affiliations

Department of Chemistry, Molecular Sciences Research Hub, White City Campus, Imperial College London, Wood Lane, London, W12 0BZ, UK
Marcin Miklitz, Lukas Turcani & Kim E. Jelfs
Department of Chemistry and Materials Innovation Factory, University of Liverpool, 51 Oxford Street, Liverpool, L7 3NY, UK
Rebecca L. Greenaway

Authors

Marcin Miklitz
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Turcani
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca L. Greenaway
View author publications
You can also search for this author in PubMed Google Scholar
Kim E. Jelfs
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.M. carried out the simulations with assistance from L.T. and R.L.G. designed the precursor library and assisted with experimental insight. K.E.J. supervised the research and contributed the design of the research and oversaw the writing of the manuscript.

Corresponding author

Correspondence to Kim E. Jelfs.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Miklitz, M., Turcani, L., Greenaway, R.L. et al. Computational discovery of molecular C₆₀ encapsulants with an evolutionary algorithm. Commun Chem 3, 10 (2020). https://doi.org/10.1038/s42004-020-0255-8

Download citation

Received: 02 September 2019
Accepted: 20 December 2019
Published: 22 January 2020
DOI: https://doi.org/10.1038/s42004-020-0255-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.