Introduction

Thin films are of high importance both in modern technology as they are used as building elements of micro- and nanosystems but also in macroscopic applications where they add functionalities to bulk materials. Furthermore, they play a major role in materials discovery and design1,2. Next to composition and phase constitution, the microstructure of thin films is decisive for their properties. The microstructure depends on synthesis conditions and the material itself. Microstructure is important for extrinsic properties, determines functionality and its optimization leads to significant performance enhancement3,4,5,6,7. Successful synthesis, e.g., magnetron sputtering, of thin films needs to master many process parameters (e.g., power supply usage: (direct current (DC), radio frequency (RF), high power impulse magnetron sputtering (HiPIMS))8,9, pressure, bias, gas composition, setup, and geometry) which determine plasma conditions and affect film growth10,11. However, the selection of process parameters, especially for the deposition of new materials, is still mostly based on the scientists’ expertise and intuition and these parameters are usually optimized empirically. The film growth and the resulting microstructure at a fixed temperature is primarily determined by the relative flux of all particles in the gas phase, e.g., gas ions, metal ions, neutrals, thermalized atoms, arriving at the substrate12,13. Additional influencing factors are the substrate geometry14, the interaction strength between the film and the substrate15 and the films crystallographic properties16. Further, film microstructure is strongly dependent on the energy introduced into the growing surface by energetic ion bombardment17,18. The role of particle–surface interactions in altering film growth kinetics with respect to microstructure is not yet fully understood.

The need to predict microstructures from process parameters has inspired the development of structure zone diagrams (SZD, also referred to as structure zone models), first introduced by Movchan and Demchishin for evaporated films19. SZD are abstracted, graphical representations of the occurrence of possible polycrystalline thin-film microstructures (similar structural features) in dependence on processing parameters (e.g., homologous temperature Tdep/Tmelt). SZD are not purely phenomenological since their design includes fundamental knowledge about structure forming mechanisms. Kusano recently showed the validity of structure zone diagrams with respect to optimizing film deposition conditions of refractory metals and refractory metal oxides20. The simplicity of SZD, which enables estimation of process-dependent microstructures, is also their main drawback, as the actual process parameter space is much larger than what is covered in a classical SZD. Especially with compositionally complex materials, the quality of predictions from simple SZD is limited.

Refined versions of the initial SZD were introduced for magnetron sputtered films: Homologous temperature and sputter pressure21, homologous temperature and ion bombardment22, level of contamination23, reactive gas to metal flux ratio24, extreme shadowing conditions25. Classical SZD for sputtering roughly categorize microstructures into four structure zones (I, T, II, and III)21. More subzones can be identified based on adatom mobility conditions which influences crystalline texture26. Although SZD are useful and popular, they only have a very limited predictive capability since they are based on many generalizations and assumptions, e.g., the pressure is a proxy for the constitution of the incoming particle flux (kinetic energy, ratio of ion-to-growth flux, flux composition). Several revised SZD have in common that they are either strongly abstracted27 or materials specific28. Classical SZD relate processing to microstructure, however only for single elements or binary systems and using system-specific deposition process parameters like gas pressure or substrate bias, which are almost impossible to transfer between deposition systems. In order to identify an ideal microstructure for desired properties, classic SZD are helpful as they give the researchers a hint of likely microstructures, but empirical studies are still required, which require extensive experimental efforts.

To improve the predictive quality of SZDs, multiple input parameters (e.g., incoming particle flux, ion energy, temperature, discharge properties like peak power density and duty cycle, chemical composition, etc.) should be considered conjointly, leading to several challenges, e.g., the visualization of a multidimensional parameter space. Anders proposed to include plasma parameters and thickness information (deposition, etching)27. His SZD keeps three axes, including two generalized axes (temperature, energy) and the third axis film thickness27. However, the generalized axes include unknown factors, i.e., the formula for the calculation of generalized temperature and energy axes. In order to overcome the limitations of SZD, computational methods could be applied. The goal is to achieve a reliable prediction of complex, realistic microstructures based on given properties like composition and relevant process parameters. Microstructures can be predicted by simulations, e.g., kinetic Monte Carlo29,30,31,32,33 or molecular dynamic simulation34,35, which depend on selection of model architectures, the selection of initial values and are computationally expensive. The interpretation of the overlap between simulation and experimental results remains to be performed by human assessment. A physical model for an accurate calculation of the microstructure from process parameters needs integrated cross-disciplinary models that cover the plasma discharge at the target, transport of plasma species to the substrate and atomistic processes on the surface and in the volume of the film. Although progress has been made in various areas (electron36,37 particle transport38, plasma-surface interaction39, DFT40), a unified model is still unattainable today.

If physical models do not exist, instead of applying atomistic calculations, machine learning can provide surrogate models bridging the gap between process parameters and resulting microstructure. Machine learning evolved as a new category for microstructure cluster analysis41,42, microstructure recognition43,44,45, defect analysis46, materials design47, and materials optimization48. Generative deep learning models are able to produce new data based on hidden information in training data49. The two most popular models are variational autoencoders (VAE)50 and generative adversarial neural networks (GAN)51. VAEs were applied to predict optical transmission spectra from scanned pictures of oxide materials and vice versa52, for molecular design53 and for microstructures in materials design53,54,55. Noraas et al. proposed to use generative deep learning models for material design to identify processing–structure–property relations and predict microstructures56.

Many thin films in science and technology have a multinary composition and processing variations lead to an “explosion” of combinations which all would need to be tested to find the best processing condition leading to the optimal microstructure. In order to reduce the cost of microstructure design, we apply machine learning of experimental thin-film SEM surface images and conditional parameters (chemical composition and process parameters). The results are visualized in the form of generative-structure zone diagrams that can be utilized in order to select process parameters and chemical composition to achieve a desired thin film microstructure.

Results and discussion

Our approach

Two generative models are investigated: a VAE and a conditional GAN (cGAN). The VAE model provides an overview and interpretation of similarities and variations in the dataset by dimensionality reduction and clustering. The generative abilities of the cGAN are applied to conditionally predict microstructures based on conditional parameters. Furthermore, the general ability of deep learning models to generate specialized SZDs based on a limited number of observations is demonstrated. This approach predicts realistic process–microstructure relations with a generative model being trained on experimental observations only. Our approach handles complexity by (I) performing a limited set of experiments, using “processing libraries” to efficiently generate comprehensive training datasets; (II) training deep learning models to handle SEM microstructure images, (III) visualization of the similarities between different synthesis paths, and (IV) predictions of microstructures for new parameters from relations found in the training data. We select a material system from the class of transition metal nitrides, which are applied as hard protective coatings, Cr–Al–O–N57, for training and evaluation of our models. Cr–Al–O–N and subsystems (e.g., Al–Cr–N, CrN) have been the subject of many studies58,59,60,61. Our Cr–Al–O–N dataset, efficiently created from materials and processing libraries, in total containing 123 samples, includes variations of six conditional parameters, covering different combinations of compositional (Al concentration (Al), O-concentration (O) in Cr1-x–Alx–Oy–N) and process parameters (deposition temperature (Td), average ion energy (EI), degree of ionization (Id) and deposition pressure (Pd)). Id is a design parameter which is related to the ratio of ion flux and the total growth flux of all deposited particles. In order to provide a sufficient quantity of data, 128 patches with size 128 × 128 px2 were extracted randomly from each SEM image (see Methods). All depositions were carried out in one sputter system (ATC 2200, AJA International), therefore the geometrical factors that usually change between different deposition equipment is not present. As thin film microstructure is also thickness dependent, all analyzed samples are in a similar thickness range (800–1300 nm) and exhibit a fully developed microstructure.

To be able to study synthesis–processing–structure relationships, usually a large number of synthesis processes need to be carried out to create a sufficiently large dataset, which is time consuming. To substantially lower the number of necessary synthesis processes, we use combinatorial sputtering of thin-film materials libraries. We introduce the concept of “processing libraries” (PL): These are comparable to materials libraries, but, instead of a composition variation, PL comprise thin films synthesized using a set of different synthesis parameters, at either a constant materials composition, or additionally for different compositions (see Methods). The samples in a PL are subject to predetermined variations of the conditional parameters (EI, Id, Td, Pd, Al, O). The film growth develops to a microstructure, which is characterized by geometrically different surface features in terms of size, shape, and density. For a comprehensive study of possible microstructures, we exploit the process parameter space for synthesis conditions and repeat these processes for different chemical compositions. Film microstructures are usually assessed by surface and cross-sectional SEM images. Since high quality cross-sectional images are experimentally expensive and their interpretation is complicated, we focus on topographic surface images, as these are more comparable and describable. Surface morphology in terms of grain size and feature shapes can be used to correlate growth conditions and surface diffusion processes with resulting crystallographic orientation26.

Process–composition–microstructure relations

In order to inspect the dataset, we train a VAE with a regression model that uses the sampling layer (z) of the VAE as an input to predict the conditions (see Methods). The model optimizes simultaneously on microstructure images and conditional parameters and achieves a well-structured and dense representation (latent space embedding). The 64-dimensional latent space is further dimensionally-reduced by kernel principle component analysis (kPCA) with a radial basis function (RBF) kernel62 in order to provide graphical visualization in 2D. If the microstructure, composition and process parameters correlate, the images should cluster in the VAE latent space.

Figure 1 shows the first two components of the kPCA latent space representation of the validation set. The axes (kPCA 1, kPCA 2) have no actual physical meaning: they are rather a rough expression of how the VAE recognizes images and the conditional parameter space and joins them in a dense layer. Each microstructure image is plotted at its position in the dimensionally-reduced latent space embedding of the VAE. The images cluster in regions of similar sizes and shapes. A coarse-facetted surface morphology is observed at kPCA 1 = −0.1 and kPCA 2 = −0.3. With increasing kPCA 1 and kPCA 2 the feature size decreases. With values of kPCA 1 < 0, mainly facetted grains are observed, while for kPCA 1 > 0 the features become more fine-grained and nanocrystalline.

Fig. 1: Latent space representation of all microstructures from the validation set.
figure 1

Patches created from experimental SEM surface images are plotted at their position in the dimensionally-reduced latent space. A continuous variation of microstructures is observed with respect to similarity of feature size and shape (e.g., featureless, fine-grained, oriented-facetted, smooth-facetted, coarse-facetted).

This qualitative overview of the microstructures in the dataset is now correlated to chemical composition and process characteristics: Fig. 2 shows the microstructure images plotted at their latent space position and the position of each sample in the latent space with their respective color-coded composition or process parameters. This visualizes the interplay between conditional parameters and their significance on microstructural features.

Fig. 2: Visualization of process–composition–microstructure relations.
figure 2

Correlation of the microstructures in the dataset with chemical composition and conditional parameters. VAE latent representation of microstructures (a) and conditional parameters b Td, c Al, d EI, e O, and f Id from the validation set.

We now address the effect of each deposition parameter in order to provide a discussion baseline for the trends that are created by the prediction of the cGAN model. Samples with different levels of O-contamination are separated in latent space and show a clear trend in feature size (Fig. 2e). O leads to nucleation sites for O-phases in the fcc Cr–Al–N phase. The growth of the fcc Cr–Al–N phase can be inhibited by these O-phases23. Figure 2c shows a similar trend for Al63. A solid solution for Cr1−x–Alx–N with up to 70 at.% Al is known60, whereas between 50 and 70 at.% Al, hcp AlN precipitates64. The maximum solubility of Al in fcc CrN depends on process parameters65. The formation of a second phase, hcp AlN, at higher Al could be the reason of decreasing grain size with increasing Al. An increase in Td (Fig. 2b) leads to an increase in feature size. The feature shapes change from fine granular to sharp facetted grains and at high Td to coarse-facetted grains with relatively flat surfaces due to higher diffusion rates28. An increase in EI (Fig. 2d) above a certain threshold leads to a smoother surface as kinetic bombardment flattens facets and in extreme cases a featureless surface is observed. In addition, surface mobility is kinetically enhanced by ion bombardment66. An increase in EI and Id can cause a higher nucleation density and a decreased grain size67. These effects are most significant at low Td where diffusion is limited. At Td > 400 °C the effect is reduced due to higher diffusion. Oriented facets are observed up to Td = 400 °C which is a result of the by 27° inclined cathodes and low diffusion26. At higher Td the facets are randomly oriented which is an effect of higher adatom diffusivity. An increasing Pd (not shown) leads to an increase in gas atoms or molecules per volume and thereby to a decrease in mean free path68. Particles experience more collisions during their path from the target surface to the substrate and thereby lose energy. In addition, Id and the ratio of gas ions to target ions increases, which influences surface kinetics. This illustrates the complex interplay between process parameters, composition and resulting microstructure. Also, it shows the usefulness of dimensionality reduction to gain an overview of complex datasets. The identified trends correlate well with results from literature.

Prediction of microstructures from conditional parameters

The decoder part of the VAE could be applied to generate images from the latent representation, but the quality is unsatisfactory due to known limitations of VAEs69 (please review example code for VAE predictions). In contrast, GAN models are known to be able to produce photorealistic images70. To predict microstructures from the six conditional parameters, we train a cGAN model71. In order to categorize the level of prediction, we need to define what the model can learn from the experimental dataset. A reconstruction of a microstructure from the training set provides the baseline. Figure 3 compares experimental images to their predicted counterparts by their particle size distribution. The cGAN generates these microstructure images using two inputs only: conditional parameters and a latent sub-space with random noise. It should be noted that the cGAN is not trained to generate an exact copy of the original image. The histograms in Fig. 3 contain particle counts from 100 images patches of both experimental and predicted images. The generated images generally show a good reproduction of the experimental images in terms of feature size and shape. Even contrast variations on facets are reproduced. Figure 3f shows an exception, where locally, smaller features are generated on top of otherwise large smooth grain surfaces. This relates to the problem that the image patches only show small fractions of these large grains and the microstructure of 800 °C deposited samples strongly differs from all other images. In Fig. 3a, b, e, the generated images are nearly indistinguishable from their experimental counterparts. The facet shapes in Fig. 3a are not as sharp as in the experimental images and the facets in Fig. 3d show more curvature compared with the original images. However, the reproduced features can still be identified as facetted and the feature sizes match well. The generated image in Fig. 3d appear blurred and show less contrast than the experimental image. A low contrast in the experimental images of these smooth dense microstructures might affect the training of the model.

Fig. 3: Evaluation of predicted microstructures.
figure 3

Particle size distributions and examples of SEM surface microstructure images (blue frame) and cGAN-generated images (green frame) of the training set for the same conditional parameters; results are shown for a a Cr–N film deposited without intentional heating; b, e Cr–Al–O–N films with 10 at.% O-contamination, deposited at 200 and 600 °C, respectively; c a Cr–Al–N thin film deposited at 500 °C; d, f Cr–O–N films with 3 at.% O-contamination that were deposited at 500 and 800 °C, respectively. The conditional parameters of the shown images are summarized in the table.

Machine learning models can only learn from information provided to them. Interpolations are reasonable while extrapolations are more challenging. For example, a microstructure prediction for a sample with high Al at Td = 1000 °C would fail, as a phase decomposition is expected which leads to a (for the model) unpredictable microstructure. The training set libraries were synthesized at selected basic process conditions (e.g., constant Td, EI, Al, or O) and contain a variation of one or two additional parameters. Therefore, the complete dataset has only limited intersections and extensions into other conditions. For example, a variation of EI between 40 and 200 eV was only carried out for samples deposited at 500 °C. In order to predict a sample deposited at Td = 100 °C and EI = 200 eV, a transfer of the EI trend on the Td trend is required. To validate the predictive capabilities of the model, new microstructure images are generated for extensions of a chosen base condition. A CrN sample (Td = 20 °C, EI = 1 eV, Id = 0.1, Pd = 0.5, Al = 0 at.%, O = 0 at.%) provides the base condition (Fig. 4, orange frame). The sample exhibits a triangular facetted morphology. Figure 4 visualizes how the microstructure of the initial sample changes when only a single condition is changed at a time and the other parameters stay constant. Experimental structures synthesized with the same or similar process parameters (closest experimental condition) are compared. The trends from Fig. 2 for the different conditional parameters are reproduced by the model. In the example, Al and O lead to a refinement of the microstructure, EI leads to smoother facets, and Td increases the grain size.

Fig. 4: Predicted microstructures from cGAN and comparison with experimental results.
figure 4

Predictions (blue frame) are based on the experimental microstructure of Cr–N deposited at room temperature (orange frame) with corresponding conditional parameters. The changed parameter for each predicted or experimental image (green frames) is indicated while all other parameters stay constant. If an experimental counterpart with identical conditional parameters is missing, the closest related experimental image is selected and the additional change in conditional parameters is marked above the image (e.g., +480 °C).

To validate the cGAN prediction quality on unseen conditions, microstructures from experimental test samples which were not included in the training set are predicted. Cr1−xAlxN samples grown at 500 °C from the training set were deposited at <10 eV (0 V substrate bias) and >100 eV (−100 V substrate bias) for different Al. As an example, the cGAN predicts images for the variation of Al at 40 eV (Fig. 5). This requires an interpolation of EI. At 0 V bias, the faceted microstructure changes to a fine-grained microstructure with increasing Al. The same trend is observed at −100 V bias but the facets of Cr-rich samples are smoother and denser. With increasing Al, the microstructure becomes featureless. The prediction matches both trends. In direct comparison to the experimental counterparts, the facets of Cr-rich samples are less pronounced. Al-rich samples are almost indistinguishable from the test set images. These results show that the cGAN produces good results for interpolations within the dataset.

Fig. 5: Synopsis of experimental and predicted images.
figure 5

The data shows the effect of bias voltage (EI) and Al concentration on the surface microstructure. The blue box contains images from the experimental test set. The red dashed box shows cGAN predictions for the exact conditions of the experimental test set. The green boxes show images from the experimental training set.

Finally, a SZD is generated by the cGAN. The advantage of this generative SZD (gSZD) is that it can be produced as required. In a 2D representation, two parameters can be varied while the remaining four parameters are selected constant. Figure 6a shows a gSZD for a variation of Al and Td at constant values for the remaining parameters (constant O = 1 at.%, EI = 40 eV, Id = 1.0, Pd = 0.5 Pa). Al and Td are varied randomly between 0–60 at.% and 20–600 °C, respectively. The predicted image patches are plotted at positions according to their input conditions. Hence, patches overlay and appear as a continuous diagram. A clear variation of the microstructure in dependence of Td and Al is observed. Figure 6b shows the CNN-predicted structure class in dependence of Td and Al (average over 100× variations of the random latent variable per (Td, Al)-step). The structure changes with increasing Td from oriented-facetted to facetted. An increase in Al leads to a change from facetted to a fine-grained structure. The remaining parameters (O, Id, EI) vary in the experimental data, while they are kept constant in the gSZD. A variation of Al and Td was experimentally realized at 10 at.% O while samples with a variation of EI were deposited at 500 °C and contain 0 at.% O. Thus, the model combines (I) the structural refinement with increasing Al and (II) the trend that this refinement is inhibited with increasing Td. In other words, a higher Td is necessary at high Al to obtain a similar feature size and shape as compared with Cr-rich compositions without O and Al. For the gSZD these can be interpreted in the following way. In general, an increase in Al leads to a refinement of the microstructure due to changes in adatom surface mobility conditions72 and second phase formation which inhibits crystal growth23. This trend is most significant at low temperatures, where diffusion is limited. With increasing temperature, the facetted structure extends to higher Al. An increase in feature size with Td is observed. The observed trends change to a finer structure when O is increased stepwise (not shown) and facets are smoothed out by an increase of EI to 200 eV (Fig. 6c). In addition, a featureless microstructure is observed at high Al and low Td. These results are consistent with conclusions from literature, which means that the cGAN model is able to correctly capture trends from an inhomogeneously distributed dataset and perform qualitative predictions by combining the learned information.

Fig. 6: Generated SZD and predicted microstructure classes.
figure 6

a gSZD generated by the cGAN model for a variation of Al und Td. The remaining conditional parameters were chosen to be Id = 1.0, EI = 40 eV, O = 1 at%, Pd = 0.5 Pa. Panel b shows the average probability and standard deviation of the predicted microstructure classes of the gSZD by the CNN. The dashed blue lines indicate the technological boundary of the use case. Panel c shows the predicted microstructure classes for an increased EI of 200 eV.

Definition of conditions for thin films with optimized microstructures

Finally, by combination of domain knowledge and the new gSZD, we are able to design a composition-process-window to create films for a desired application. The cGAN model is applied to predict microstructures for a variation of two conditional parameters (e.g., Al and Td). The microstructure classifier model is used to determine the microstructure class for the predicted images. For an example application of hard protective coatings for polymer injection molding or extrusion tools73, the tribological performance needs to be optimized, requiring films with a dense, smooth or fine-grained microstructure. Physical boundaries are provided by the maximum values of Al and Td (blue boundaries in Fig. 6b, c). Td is limited by the temper diagram of cold work steel AISI 420 (X42Cr13, 1.2083). To avoid tempering of the substrate, the maximum Td should be lower than 450 °C. Al is limited by the formation of hcp AlN above 50 at.% Al, which would lead to a reduction in hardness74. To achieve a fine-grained film, Al should be as high as possible, according to the gSZD (Fig. 6a). In addition, Td should be as high as possible in order to reduce grain boundary porosity. With an included uncertainty (standard deviation in Fig. 6b), the new composition-process-window (“window of opportunity”) can be selected from Fig. 6b according to the desired microstructure (e.g., fine-grained). In Fig. 6c the microstructure probability is shown for an increased EI of 200 eV. Under this high EI condition, the process window for a fine-grained microstructure is increased and a deposition at higher Td and supposedly higher grain boundary density can be conducted while a secure distance to the precipitation boundary of hcp-AlN is retained.

In summary, we applied combinatorial synthesis methods to create materials and process libraries of the Cr–Al–O–N system in order to observe the influence of composition and process parameters on the resulting microstructural properties. Our training set of samples from the Cr–Al–O–N system covers variations in the directions of previous SZD (Td, Pd, Id, EI, O) and an additional compositional variation of Al. A generative neural network (cGAN) was trained on SEM surface images to predict microstructures based on the input of composition and process parameters. The model reproduces the observed trends in the dataset. Furthermore, we were able to validate the predictive capabilities on test data, which requires an interpolation of conditional parameters. A microstructure classifier model and particle size distribution analysis are used to validate the predictions of the cGAN. A transfer of trends from sampled regions to un-sampled regions was demonstrated in a new generative SZD. The gSZD shows the expected microstructure of thin films for a variation of Al concentration and deposition temperature, which will be useful for the optimization of TM–Al–N (TM = transition metal) thin films. The observed microstructure predictions in the gSZD are consistent with observations from literature. A so far unseen level of predictive quality in the scope of SZD is observed which will lead to an acceleration in the development and optimization of thin films with a desired microstructure. Further this approach could be extended to other materials in thin film and bulk form.

Methods

Sample synthesis

Sample synthesis is performed in a multi-cathode magnetron sputter chamber (ATC 2200, AJA International). All samples are deposited reactively with an Ar/N2 flux ratio of 1 and a total gas flux of 80 sccm. The deposition pressure is controlled automatically by adjusting the pumping speed. Two confocal aligned cathodes (Al, Cr) facing the substrate lead to a continuous composition gradient of the two base materials, which results in a materials library. The substrate is heated with a resistive heater. In order to create a PL, an in-house made step heater is used to heat five substrates simultaneously at five different temperatures in the range from 200 to 800 °C, thereby covering a large temperature range of typical SZDs within a single PL28. PLs with a continuous variation of plasma parameters, e.g., EI and Id are synthesized by sputtering from two confocally aligned magnetrons which are operated by different power supplies. One cathode is powered by DC, the other one by HiPIMS (high power impulse magnetron sputtering). The substrate is placed centered below the two cathodes. A similar concept was chosen by Greczynski et al.75. The pulsed HiPIMS discharge produces a one magnitude larger number of ionized species and higher ion energies compared with a DC discharge76. An additional substrate bias is applied in some cases to further accelerate ions and increase EI. By placing the substrate in the center below the two inclined cathodes, the travel distance of the ionized species of the HiPIMS discharge increases towards the substrate positions next to the DC cathode. The ions thermalize due to collisions with other plasma species and loose energy. This effect is amplified by the angular distribution of the sputtered species77. Consequently, the ratio of ions per deposited atom as well as the average ion energy are different along the 100-mm diameter substrate. In order to achieve a homogenous film thickness, the DC power is reduced to match the typically lower deposition rate of the HiPIMS powered cathode. A variation in the degree of ionization is achieved by a variation of the sputter frequency at constant average power in HiPIMS processes. An increase in frequency leads to a decrease in target peak power density which leads to a decrease in Id and a small decrease (up to 3 eV) in EI. The O-concentration in several of the discussed samples are contaminations from residual gas outgassing from the deposition equipment, which is especially present at elevated temperatures (>600 °C).

Thin film characterization

The chemical composition (Al/Cr) is determined by EDX (Inca X-act, Oxford Instruments). The O-concentration is determined by XPS (Kratos Axis Nova) for a subset of the samples. All films are stoichiometric by the definition (Al + Cr)/(O + N) = 1. The stoichiometry is validated for additional samples that are deposited under similar process conditions (not shown) by RBS measurements, within a 5 at.% error. SEM images are taken in a Jeol 7200F using the secondary electron detector at ×50,000 magnification at an image size of 1280 × 960 pixels. The SEM images are histogram-equalized using contrast limited adaptive histogram equalization (CLAHE)78.

Plasma properties

EI was calculated from retarding field energy analyzer measurements of a previous study79 that were carried out at five measurement positions along the 100 mm substrate area in three reactive co-deposition processes of Al and Cr at 100, 200, and 400 Hz sputter frequency at 0.5 Pa. If a substrate bias was applied, an additional ion energy was added to the total ion energy (e.g., EI + 40 eV bias). To estimate Id, the ratio of total ion flux and growth flux was calculated. Unknown values for conditions that were not measured are estimated by extrapolation. The ion-to-growth flux ratios are normalized over the dataset. These values provide only a rough estimation that covers the known trends from literature and our own investigations. It should be noted that we consider Id a physics-informed descriptor, rather than a physical property.

Data handling

Our dataset contains 123 individual samples. The 1280 × 960 px2 images locally contain characteristic microstructure features that are distributed repeatedly over the image. Patches are extracted at random points of each image. Each of the extracted patches cover a large enough range to represent the characteristic microstructure of the synthesis condition. We choose a patch size of 128 × 128 px2 and scale them by a factor of 2 into 64 × 64 px2 to speed up computations. The images patches have a pixel density of 0.27 px/nm. A total of 128 patches are cropped per each image which results in an average pixel shift of 10 and 7.5 px per patch (1280/128, 960/128). The training data therefore contains more than 10,000 different image patches depending on the train-test split. For the VAE, the complete dataset is split randomly at a ratio of 70:30 (train:validation). In case of the cGAN, a test set (13 out of 123 original SEM images) for the conditions (described in Fig. 5) is removed from the dataset.

Machine learning models

The VAE model consist of three models, an encoder, a decoder, and a regression model. Encoder and decoder represent the variational autoencoder (VAE) part of the model. The image patches of size (64 × 64 × 1) provide the input and the output of the VAE. The encoder consists of five convolution building blocks which comprise a 2D convolutional layer that is followed by batch normalization, a Leaky ReLU activation function and a dropout layer. The filter sizes are 32, 64, 128, 128, and 128. The kernel size is 4 × 4. The output of the last convolutional layer is flattened and connected to two dense layers (µ and σ) with 64 dimensions. These are passed to a sampling layer (z) which samples the latent space according to the formula: z = μ + αεeσ/2. ε is a random normal tensor with zero mean and unit variance and has the same shape as µ. α is a constant which is set to 1 during training and otherwise to 0. The decoder reflects the structure of the encoder. The output of the sampling layer is passed into a dense layer with 512 neurons which is reshaped to match the shape of the last convolutional layer. The layer is passed to five building blocks which comprise a 2D convolutional layer followed by batch normalization, Leaky ReLU activation, dropout and an upsampling layer. The filter sizes of the convolutional layers are 128, 128, 128, 64, and 32. An additional convolutional layer with filter size 1 provides the output of the decoder. A regression model takes the output of the sampling layer z as an input and outputs the conditional parameters. The regression model has four dense layers with dimensions 20, 20, 20, 6 and ReLU activation, an input layer with 64 dimensions and an output layer with 6 dimensions and linear activation function. The VAE and the regression model are simultaneously trained using the Adadelta optimizer80. The VAE loss is provided by the sum of the Kullback–Leibler divergence and the image reconstruction binary cross entropy. The loss of the regression model is calculated by the mean squared error. The losses of VAE and regression model are weighted 1:10,000 in order to provide a well-structured latent space.

The generative adversarial network consists of two parts: a generator and a discriminator. The generator network has two inputs, a 16-dimensional latent space (intrinsic parameters) and six conditional physical parameters (extrinsic). The latent space input layer is followed by a dense layer with 32768 neurons and Leaky ReLU activation function and then reshaped into a 16 × 16 layer with 128 channels. The conditional input layer is followed by 256 dense layers with linear activation function and reshaped into a 16 × 16 matrix with one channel. Two reshaped 16 × 16 matrices are combined together and followed by two convolutional-transpose layers with Leaky ReLU activation functions, with an upscaling factor of 2 and 128 filters for each layer. The last layer is convolutional with hyperbolic tangent activation and 64 × 64 × 1 shaped of output. The discriminator network also has two inputs, the six conditional physical parameters and a 64 × 64 × 1 input image. As in the generator network the conditional input layer is converted into a 64 × 64 × 1 matrix with one dense layer and concatenated with the input image. This is followed by two convolutional layers with 128 channels and a downscaling factor of 2, which results in a 16 × 16 × 128 matrix. A flattening layer is followed by a dropout layer with a dropout factor = 0.4 and a dense output layer with sigmoid activation function. The same conditional extrinsic physical parameters were fed into both the generator and the discriminator. The discriminator model has a binary cross-entropy loss function and an Adam optimizer81 with a learning rate equal to 0.0002, and beta_1 equal to 0.5. The loss function for the generator is approximated by the negative discriminator, in a spirit of adversarial network training. The training procedure consists of consecutive training of the discriminator on small batches of real and fake images with corresponding conditional physical parameters and generator training on randomly generated points from latent space and realistic extrinsic parameters.

Two metrics are introduced that provide qualitative and quantitative comparison of conditionally generated and experimental images. The type of microstructure features is identified by a convolutional neural network (CNN) classifier and particle analysis is performed using the ImageJ particle analyzer.

The classifier is trained to categorize microstructures in six classes (featureless, fine-grained, oriented-facetted, smooth-facetted, facetted, and coarse-facetted). The model consists of three convolutional blocks consisting of a convolutional layer followed by a ReLU activation and a max pooling layer. The output of the third convolutional block is flattened and followed by a dense layer with 64 neurons and ReLU activation, followed by a dropout layer with rate 0.5. The final output layer is a dense layer with 4 neurons and softmax activation function. The complete dataset is split train and test set at a ratio of 4:1. The train set is further split into a train and validation set at a ratio of 7:3. The model is trained using the Adam optimizer81 and converges after ~13 epochs. The validation and test accuracies are ~93–95%.

Particle analysis is performed on experimental and conditionally generated images. A Gaussian filter is applied to reduce noise in the images. Afterward a threshold is applied and the images are transformed into a binary mask, followed by watershed segmentation and ImageJ particle analyzer. Hundred patches per conditions are analyzed. The histogram of the feret diameter measurement (equivalent to particle size or grain size) is evaluated.