Introduction

Low-frequency (LF) 1/f noise spectroscopy is a nondestructive defect diagnosis tool, which identifies dominant scattering origins. Such scattering origins are caused by imperfect crystallinity, lattice vibration, surface trap distribution, and channel and dielectric defects, in addition to the Schottky barrier inhomogeneity at the metal-semiconductor interface in semiconductor devices1,2,3,4,5,6. However, as the size of the channel material decreases, particularly in the case of two-dimensional (2D) layered materials, their atomically thin nature with a large surface-to-volume ratio makes it significantly difficult to investigate them using LF 1/f noise analysis as compared with their bulk silicon counterparts7,8. Conventionally, the time-resolved current (I) variation in electronic devices has been ascribed to the carrier number and/or mobility fluctuation9; \({\Delta}I(t) \propto q\mu \left( {{\Delta}N} \right) + q\left( {{\Delta}\mu } \right)N\), where q, μ, and N denote the elementary unit charge, carrier mobility, and number of charge carriers, respectively. However, since the inherent vulnerability of 2D materials to surrounding interfaces considerably influences the charge fluctuation, this high sensitivity of LF noise features would reflect the individual effects of both channel and dielectric materials in addition to the presence of chemical/electrical doping10,11.

Thus far, numerous LF noise features have been reported on 2D materials, such as the presence of electron-hole puddle induced charge scattering on graphene12, the Coulomb scattering suppression via high-κ passivation of black phosphorus (BP)13, the promotion of charge fluctuation in molybdenum disulfide (MoS2) due to the height and inhomogeneity of the Schottky barrier11, the anisotropic LF noise feature of rhenium disulfide (ReS2)14, and the thickness-dependent Coulomb scattering parameter of molybdenum ditelluride (MoTe2)15. These studies indicate the high feasibility of LF noise spectroscopy as a tool to classify the material and device properties. Nevertheless, the origin of carrier fluctuation, occurring either in the 2D layered material itself or at the interface between the 2D layered material and the gate dielectric, has not been identified clearly. Moreover, it is significantly difficult to identify an individual noise source from the LF noise data without appropriate data processing for the model-dependent LF noise analysis.

Most recently, the combination of artificial-intelligence (AI) based approach and scientific data analysis has been widely considered in various applications such as healthcare16,17, image recognition18,19, voice search20, and molecular/material science21,22. Further, it has also been determined that these combined techniques are suitable for solving the problems associated with non-linear processes or enormous combinatorial spaces with high efficiency16,22,23,24. This clearly indicates that the machine learning (ML) and deep learning (DL) approaches can provide a better optimization and decision-making by converging the scientific data and extracting interpretable models from these data automatically22,25. Recently, studies on applying ML or DL to analysis of 2D layered materials have been widely conducted26,27,28,29.

In this study, we introduce an effective technique to classify and infer the characterization of current fluctuation with high efficiency and precision by combining AI and LF noise spectroscopy. Due to the similarities of the fabrication process, geometry, bandgap, and mobility of 2D FETs, classifying only using fundamental DC analysis of transfer curves and output characteristics is very difficult. On the other hand, since LF noise data measure the tiny fluctuations of carriers in channel according to time, characterizing an own FET is easily explained. Based on the time-resolved ΔI(t) measured from various 2D material-based field-effect transistors (FETs), 2D arrays of the Mel-frequency cepstral coefficient (MFCC) for several electronic properties were considered, and the corresponding features were obtained via a hidden Markov model (HMM). HMM has disadvantages that it must have a relatively large amount of data and hardly express dependencies between hidden states. However, HMM is suitable for processing a large amount of LF noise data based on the advantages of having a strong statistical foundation and enabling efficient learning from raw sequence data. This approach allows us to automatically identify essential device information such as the type of 2D channel materials and gate dielectrics, interface trap density (Nit), Coulomb scattering parameter (αSC), and the presence of chemical and electron beam doping. Therefore, the combination of factors such as channel material, gate dielectric, contact metal, and electron beam irradiation significantly affects carrier fluctuations as a function of time. This combination, which has more than 100 LF noise data sets under 32 conditions, becomes a catalyst for machine learning that automatically and effectively classify the characteristics of various nanoelectronic devices. In addition, the obtained LF noise spectroscopy data are highly interpretable via machine learning techniques, thereby identifying the contribution of engineered features in characterizing the device information and performance.

Results

Workflow for audio and current signal classification

The decimal data type, measured in the time domain, has been used generally for ML and DL in data science; however, the Fourier transform (FT) of this data are frequently employed in ML algorithms to improve data interpretation30,31. The process of transforming raw data into a suitable representation for a learning algorithm is often called featurization. For instance, in speech recognition, proper methodology has been widely studied to convert a signal from the time domain to the frequency domain for more accurate classification and analysis31,32,33. A typical data demonstration method that extracts the characteristics of the original audio signal through the Mel-frequency cepstral coefficient is illustrated in Fig. 1a. Each speech frame of the time domain signal is first obtained through the pre-emphasis, framing, window, and other processing of the original audio signal as expressed in schematic (i) of Fig. 1a. Subsequently, the speech signals comprising a 30 ms frame window are Fast Fourier transformed (FFTed) with a Hamming window. Further, each spectrum signal is processed by Mel filters (26 filters) to obtain the corresponding Mel-frequency spectrum. Finally, the Mel-frequency spectrums are processed using discrete cosine transform (DCT) to acquire the MFCCs in the cepstral domain as shown in schematic (ii) of Fig. 1a.

Fig. 1: Workflow comparison between speech and device characteristic classification.
figure 1

The initial steps of the data representation for training: a the audio signal and b the current signal (i) in the time domain and (ii) in the frequency domain (the darker color, the smaller value). c The inference steps after ML for (i) speech recognition and (ii) materials/characteristics classification. d Side views of the device structures, which are fabricated using various materials after being subjected to external factors such as e-beam irradiation, triethanolamine (TEOA) chemical doping, and temperature variations; Au, Ti, Pt, and Cr are used as the source/drain contact metals; silicon dioxide (SiO2) and hexagonal boron nitride (h-BN) are used as the gate dielectrics; the channels are composed of a combination of various atoms such as Mo, W, S, Se, Te, C, and black phosphorus (BP); MoS2, MoTe2, WSe2, ReS2, graphene, and BP have thicknesses varying from monolayer through to 40 layers.

We employ this data processing algorithm for a number of ΔI(t) data obtained from various 2D material based FETs which have been fabricated and analyzed under various experimental conditions such as different gate dielectrics11,13,34,35, temperatures11,34,36, channel materials10,11,13,34,35,37,38,39, chemical/electron beam doping40,41, and source/drain contact metals11,34 (see Fig. 1b). More than 100 LF noise data sets of various 2D layered FETs were considered in this study under 32 different conditions at a particular gate (VG) and drain (VD) bias condition. In contrast to the audio signal shown in Fig. 1a, after performing the additional signal normalization process, each MFCC of the current signal in the cepstral domain is consequently determined via FFT and DCT as displayed in schematic (i) and (ii) of Fig. 1b. The MFCCs of the audio and current signals, which comprise the 2D array, are respectively used in speech recognition and device classification (materials/characteristics) through the inference process using ML with the optimized algorithm (see Fig. 1c). The conditions of the device that ML trained and learned in this algorithm distinguishes are as follows (see Fig. 1d): BP, graphene, MoS2, ReS2, MoTe2, and tungsten diselenide (WSe2) were used as channel materials; h-BN and SiO2 were employed as gate oxides; Ti, Au, Pt, and Cr were used as the contact metals; and passivation, temperature variations, triethanolamine (TEOA) doping, and electron beam irradiation were considered as the different external factors.

Process flowchart for learning and classifying 2D transistor

The ΔI(t) of 2D material-based FETs under several conditions were measured at a particular VG and VD (see Fig. 2a) in a shielding metal box (see Fig. S2 and Note 2 in the Supplementary Materials for details of the LF noise measurement system)42. The drain current ID can be defined as the sum of the average statistic (DC) drain current (\(\overline {I_{\mathrm{D}}}\)) and low-noise current fluctuations (ΔID); \(I_{\mathrm{D}} = \overline {I_{\mathrm{D}}} + {\mathrm{{\Delta}}}I_{\mathrm{D}}\)3,4,5,6,9. Since the amplitude of ΔID is substantially smaller than ID, ΔID is generally converted to the voltage signal using the low-noise current-to-voltage preamplifier, as depicted in Fig. 2b. The amplified noise signal was considered as the input ΔID(t) data used in Python, where the amplitude normalization and pre-emphasis processes were performed, as presented in Fig. 2c. Subsequently, the preprocessed ΔID(t) data were separated into specific frames with respect to the time domain, and FFT was performed on these data. The transformed data produced by each frame were expressed as power spectral density (SI) in the frequency domain, and all SI were filtered onto the Mel scale. This transformation of specific frames into SI allowed the evaluation of periodic spectra, and the amount of spectral energy between frequencies could then be obtained by combining the respective frames. It was observed that the Mel scale filter interval was directly proportional to the frequencies i.e., narrow around low frequencies and became wider at the higher the frequencies indicating that the Mel-scale filter amplified the amount of energy around low frequencies (see Fig. S4 and Note 3 in the Supplementary Materials)30,43,44,45,46,47.

Fig. 2: Flowchart for learning and classifying characteristics of 2D transistors.
figure 2

a Schematic of a 2D layered FET, which was measured at a given VD and VG under various conditions in the shielded state; b amplification of the measured current signal using a low-noise current amplifier; c process of feature engineering the input data into a suitable representation (current MFCCs) through MFCC (the darker color, the smaller value); d ML with HMM algorithms using the current MFCCs, which comprises the 2D array; e deep learning process to re-learn into neural network (NN) through the score vector (Y) extracted via ML with HMM the algorithm using current MFCCs; and f inference steps of device conditions (channel material, gate material, chemical doping, and e-beam irradiation) through ML with the HMM algorithm and deep learning with the NN algorithm.

The obtained data, which are called Mel-frequency spectrums and mainly used for learning, were consequently more sensitive to the low frequency values, allowing a precise carrier scattering analysis in the devices. Subsequently, the Mel-frequency spectrums were transformed through the DCT and extracted to a finite data point sequence, composed of the current MFCCs in the cepstral domain43,44,45. Further, all SI filtered by the Mel scale were overlapped, indicating the existence of correlations between the spectral densities. These correlations could be separated using the DCT method. The current MFCCs transformed by the DCT was expressed as the change in filter energy, and a part of them was extracted to store data as 2D arrays, as demonstrated in the schematic in Fig. 2c30,47. Based on the research conducted thus far, the engineered current MFCC features were characterized into 2D arrays with a number between 200 and 1000 for each device class.

Every engineered current MFCCs feature was stored by class, based on the device conditions, and they were learned and classified using ML with an HMM algorithm and DL with an NN, as illustrated in Fig. 2d, e. The HMM based on the Markov chain30,31,48,49 in Fig. 2d was the first algorithm model used for learning and classifying the data in this study. In HMM, the unaligned training sequences are processed by iteratively evaluating the data stored as current MFCCs. For all the training parameters, the estimates with prior probability distributions are assumed using a maximum a posteriori approach30,50,51. The scores for the 32 classes were calculated as Y, as shown in Fig. 2d. Further, the class with the highest score was determined, and could be used to infer the device conditions as shown in Fig. 2f.

The second method used was the NN30,52,53, which is one of the DL methods. In this algorithm, the Y values of classes, which had been calculated by HMM, were classified by performing one additional learning step. In this method, the input score vectors, Y, were transferred to the first layer (layer-1) with 32 perceptrons, which is the number of classes, and were then transferred to the second layer (layer-2) by employing a rectified linear unit (ReLU) function as the activation function54,55. Instead of the widely used sigmoid function, we considered a ReLU function here as the activation function because of its sparse activation property, which could be partially activated by providing zero as an output against a negative input30,54,55. Subsequently, the score vector data were classified using the softmax function, which is used for classification in layer-2, and the probability of a specific class was calculated and classified, demonstrating a normalization effect. The softmax function was obtained by dividing the sigmoid value of each class by the sum of sigmoid values of all classes as described below30,54:

$${\mathrm{S}}\left( {y_i} \right) = \frac{{e^{y_i}}}{{\mathop {\sum}\nolimits_{j = 1}^i {e^{y_i}} }}$$
(1)

Compared to the HMM method, the second method had the advantage of classifying the score via repetitive training and learning, which was performed to determine the maximum value of the scores, Y, obtained through HMM. Finally, the device conditions could be inferred as indicated in Fig. 2f.

Data featurization

Numerous electrical properties of FETs, such as the carrier type (electrons or holes), field-effect mobility, subthreshold swing, and current on/off ratio of the device-under-test (DUT) can be determined from the ID – VG transfer characteristics of 2D layered FETs (see Fig. 3a and Supplementary Materials Fig. S1). However, a precise classification of the 2D FETs with fundamental DC analysis is significantly challenging, due to the similarities of mobility, bandgap, geometry, and fabrication process of 2D FETs, except under a few specific conditions such as graphene FET. For instance, ΔID(t) can be measured during 0.5 s at a particular VG and VD in a device belonging to a specific class (condition) after excluding \(\overline {I_{\mathrm{D}}}\) as illustrated in Fig. 3b. It is noteworthy that we only considered ΔID(t) data where \(\overline {I_{\mathrm{D}}}\) was larger than 100 nA to avoid a possible error caused by the minimum detection limit of our system. Subsequently, the current normalization process was performed for ΔID(t) and divided into 11 frames with a 200 ms window. The SI of each frame was converted using FFT as shown in Fig. 3c and converted into a vector, xn, possessing 100 current MFCC elements, anm, via Mel-scale filtering and DCT. The xn for each frame was concatenated to create the current MFCC 2D array of each class (condition), X(class)i, as indicated in Fig. 3d.

$$X_{(class)_i} = \left[ {x_1\,x_2\, \cdots \,x_n\, \cdots \,x_{10}\,x_{11}} \right]$$
(2)
$$x_n = \left[ {a_{n1}\,a_{n2} \cdots a_{nm} \cdots a_{n99}\,a_{n100}} \right]$$
(3)

where i depends on the specific voltage applied to the device belonging to the class (condition).

Fig. 3: Detailed flowchart of ΔID featurization.
figure 3

a Transfer characteristics (IDVG) of 2D layered FETs measured under various conditions; b ΔID(t) data measured at a particular VD and VG and then divided into specific frames with sampling period; c current power spectral densities (SI) in the frequency domain of each frame converted through FFT; d current MFCC comprising 2D array, X(class)i, obtained by concatenating current MFCC vector, x, processed through Mel filter and DCT; e sectional illustration of carrier behavior in a 2D layered FET; f SI(f) at a particular VD for several VG and spectral envelopes (the darker color, the larger VG); g Nit and αSC distributions, which were calculated by carrier number fluctuation-correlated mobility fluctuation (CNF-CMF) model of each class (the box plots are defined by 25th and 75th percentile); h engineered current MFCC 2D array, which contains carrier behaviors (the darker color, the smaller value).

In order to examine the high feasibility of our approach, we considered the carrier number fluctuation-correlated mobility fluctuation (CNF-CMF) model to interpret our ΔID(t) data (see the detailed LF noise theory in Supplementary Materials Note 2). This CNF-CMF model ascribes ΔID(t) to the carrier number fluctuation (CNF) caused by trapping/detrapping phenomena in the interface traps between the channel and gate dielectric in addition to the correlated mobility fluctuation. More specifically, ΔID(t) data can be influenced by many factors such as the carrier type of channel, interface quality and condition between the gate oxide and channel, and the presence of doping (see Fig. 3e). According to the CNF-CMF model, the drain current normalized SI can be expressed as follows5,6,9,56:

$$\frac{{S_{\mathrm{I}}}}{{\overline {I_{\mathrm{D}}} ^2}} = \frac{{q^2kTN_{{\mathrm{it}}}}}{{f^\gamma WLC_{{\mathrm{ox}}}^2}}\left( {1 + \frac{{\alpha _{{\mathrm{SC}}}\mu _{{\mathrm{eff}}}C_{{\mathrm{ox}}}\overline {I_{\mathrm{D}}} }}{{g_{\mathrm{m}}}}} \right)^2\left( {\frac{{g_{\mathrm{m}}}}{{\overline {I_{\mathrm{D}}} }}} \right)^2$$
(4)

where q is the carrier charge, k is the Boltzmann constant, T is the absolute temperature, f is the frequency, γ is the frequency exponent, Cox is the dielectric capacitance per unit area, gm is the transconductance (=Δ\(\overline {I_{\mathrm{D}}}\)VG), SVfb is the flat-band voltage spectral density, and μeff is the effective mobility. The trapped carriers near the channel-gate dielectric interface not only cause variations in \(S_{V_{fb}}\), but also degrade electron mobility, resulting in modulation of the carrier density.

Figure 3f shows the representative SI of each frame in the frequency domain among the fabricated DUTs. The observation of certain harmonics in the SI could be attributed to the carrier trapping/de-trapping process in the gate oxide trap sites. In fact, these harmonics assisted in understanding the characteristics of each device and expressed these characteristics as spectral envelopes with specific peaks32,33,57,58. Therefore, Nit and αSC between the gate dielectric and the channel of each class (condition) have a significant effect on the spectral envelopes and unique characteristics of device (see Fig. 3g). As a result, the current MFCC 2D array comprises power spectral sequences for each frequency (amplified for the low frequency region), as demonstrated in Fig. 3h. The HMM algorithm that learns the previous state and infers the next state is efficient for learning the current MFCC 2D array that contains Nit, αSC, and γ information according to the frequency sequence.

Current MFCC and classification accuracy

LF MFCC element parts (from an1 to an40) of engineered current MFCC at the same \(\overline {I_{\mathrm{D}}}\) ≈ 1 μA in each class ((i) graphene on trench structure, (ii) MoS2 on SiO2, (iii) MoS2 on SiO2 after e-beam irradiation, and MoS2 on h-BN at (iv) T = 25 K, (v) T = 100 K, and (vi) T = 200 K) are directly compared in Fig. 4a (see also Supplementary Materials Figs. S56 and Note 3). In all DUTs, we consistently observe the 1/f noise tendency. The current MFCC elements in the LF regime were considered to significantly contribute to learning and classification. Except for the graphene case (see (i) in Fig. 4a), the \(S_{\mathrm{I}}/\overline {I_{\mathrm{D}}} ^2\) curves for all 2D materials in this study fit well to the CNF-CMF model, implying the engineered current MFCCs of graphene would have a different image than those of the other 2D layered materials. The effects of e-beam irradiating the monolayer MoS2 FET on the SiO2 substrate are compared in schematic (ii) and (iii) of Fig. 4a, b. The obtained Nit increases by a factor of 10 after electron beam irradiation. Moreover, the engineered current MFCC for αSC as a function of T in the monolayer MoS2 FET on h-BN is also demonstrated (see (iv) to (vi) in Fig. 4a, b). αSC increases with increasing T from 3.23 × 104 (T = 25 K) to 3.08 × 105 V s C−1 (T = 200 K)11,36.

Fig. 4: Engineered current MFCC study and classification accuracy.
figure 4

a LF part of current MFCC (the darker color, the smaller value) converted at the same \(\overline {I_{\mathrm{D}}}\) ≈ 1 μA in each class ((i) graphene on trench structure, (ii) MoS2 on SiO2, (iii) MoS2 on SiO2 after e-beam irradiation, and MoS2 on h-BN at (iv) T = 25 K, (v) T = 100 K, and (vi) T = 200 K, respectively); b frequency distribution for a specific interval by normalizing the LF part of current MFCC; c classification accuracies and processing time obtained using the three learning and classification methods based on the number of data; d classification accuracies according to each class (the inset shows a confusion matrix and the darker color, the smaller value); e variations in the measured normalized drain current spectral densities for ReS2 and MoS2 on h-BN at f = 10 Hz and data fitted using the CNF and CNF-CMF models.

The frequency distributions are presented in a histogram with 20 intervals, as shown in Fig. 4a, using the normalized elements of LF MFCC of classes (i)–(vi) (see Fig. 4b). As Nit increases from condition (ii) to (iii), the highest frequency of the histogram shifts to the positive direction. A similar positive frequency shift is observed in cases (iv)–(vi) with the increasing T. Referring to Eq. (4), the SI varies as a function of Nit and αSC, and the corresponding current MFCCs can be extracted via featurization, consequently enabling the representation of a specific histogram tendency.

The HMM algorithm, which learns considering the correlation between the previous state and the next state, progresses under the following two learning conditions. The first learning condition is that the specific current fluctuation of each device in a specific class is due to Nit and αSC, and the current MFCC contributes to learning by considering the above information. The second learning condition is that the HMM algorithm is learned by considering the correlation between the MFCC of the previous frequency and the MFCC of the next frequency. Thus, in Eq. (4), the exponent γ, which reflects the trap distribution, also influences the learning process with the HMM algorithm, with Nit and αSC. Figure 4c displays the classification accuracies and processing time obtained using the HMM algorithm and the HMM score vector learning method employing the NN for a number of data. When the number of data was 7800, the HMM classification accuracy was 76.3% with f1-score and AUC value of <0.78 using fourfold cross-validation (see Supplementary Materials Fig. S3 and Note 3.11)30,59,60. The HMM+NN classification accuracy was 85.2% with f1-score of 0.86, AUC value of 0.83, and processing time of 3 h for each fourfold cross-validation. For 22,100 data points, the HMM+NN classification accuracy increased to 95.5% with f1-score of 0.93, AUC value of 0.91, and processing time of 11 h for each fourfold cross-validation. However, for 48,800 data points, classification accuracies exhibited no further improvement, and only the processing time increased to 15 h. Moreover, the classification accuracy learned by the convolution neural network (CNN) algorithm using current MFCCs as image data not through HMM architecture reached 93.6% with f1-score of 0.82, AUC value of 0.87 as good as the performance of HMM+NN. (see Supplementary Materials Fig. S7 and Note 4). On the other hand, the logistic regression model achieved only a classification accuracy of 88.8% with f1-score of 0.75, AUC value of 0.89. Therefore, provided that the performance of CNN architecture for classifying by learning perceptrons of each layer is acceptable, a transfer learning for any other channel or gate oxide materials can be possible61.

Most of the classes (or labels) were in good agreement with the CNF-CMF model with high averaged cross-validation accuracies of over 90% with f1-score of over 0.86 and AUC value of over 0.84, as presented in Fig. 4d. However, two exceptional classes, i.e., ReS2 (blue bar) and MoS2 (red bar) FETs fabricated on h-BN, are present in this figure with low classification accuracies of 74.2% (with f1-score of 0.79 and AUC of 0.75) and 38% (with f1-score of 0.49 and AUC of 0.51). This indicates that the current MFCCs for these classes were misinterpreted in the high current region. To interpret this miscalculation clearly, the corresponding \(S_{\mathrm{I}}/\overline {I_{\mathrm{D}}} ^2\) curves at f = 10 Hz for both cases are displayed in Fig. 4e. Although they are well fitted to the CNF-CMF model in most of the current regions, the additional contact resistance (RCT) contributing towards the total LF noise behavior in the high current regions curtails the accuracies in particular11. The inset in Fig. 4d shows the confusion matrix of the HMM+NN architecture. Interestingly, some classes, which should consider the effects on additional contact resistance such as ReS2 and MoS2 FETs using h-BN as gate dielectric, WSe2 FETs using Au as contact metal, and monolayered MoS2 FETs, are sometimes confused with each other.

Discussion

Combining the LF noise spectroscopy with machine learning algorithms provides an efficient and precise approach to characterize and classify 2D layered FETs. Through the use of an NN based on the hidden Markov model algorithm, we demonstrate that MFCCs, which were converted from the LF noise data of DUTs, can be predicted more precisely than the limits of fundamental measurements. Importantly, this method of applying only a specific voltage can be considered advantageous in both classifying device information and characterization of device performances. The combination of factors such as channel material, gate dielectric, contact metal, and electron beam irradiation have a profound effect on carrier fluctuations, enabling effective learning and training. Further, the learning models using LF noise spectroscopy presented herein are highly interpretable, and aid in identifying how engineered features, including the behaviors between carriers and traps, contribute to characterizing device information and performance. Therefore, the considerable flexibility of this approach makes it adaptable in distinguishing the degree of degradation and reliability of device and to modeling optimized fabrication conditions and device structures. The carrier transport direction, stacking order and orientation in 2D heterostructures would be a critical factor that influences significantly on charge fluctuation, expecting to enable the improved interpretation in the future via this approach. Moreover, the inference of engineered current MFCC features that currently lack sufficient noise data, combined with the CNF-CMF and additional contact noise approaches, and an improved ability to build models from limited experimental data should be possible using the developed model.

Methods

Sample fabrication

An appropriately selected chemical vapor deposited monolayer MoS2 and mechanically exfoliated 2D multilayer materials such as MoS2, BP, ReS2, MoTe2, WSe2, and h-BN were transferred onto high-quality 300 nm-thick SiO2/p+-Si substrates. To make source and drain metal electrodes on them, standard electron beam lithography was used, and 80 nm-thick Au, Ti, Pt, and Cr were deposited using an electron-beam evaporation system. To suppress the contact resistance effect at the metal-semiconductor interface, all the fabricated devices were annealed under a high vacuum condition for 2 h at 473 K. The trenched graphene FETs in this study was fabricated on a pre-patterned parallel grid structure made of spin-coated poly(Methyl Methacrylate) A2 via conventional dry transfer methods39. The Al2O3 passivation layer was deposited on 2D materials using an atomic layer deposition system.

In-situ measurement with e-beam irradiation

Electron-beam irradiation was conducted under high vacuum conditions (~10−6 Torr) at 300 K using a scanning electron microscope (SEM) (Quanta 3D FEG) chamber with a nano-manipulator for multilayer MoS2, WSe2, and monolayer MoS2 for 30 s with 30 kV and 50 pA. Four tungsten probes installed on the nano-manipulator system were electrically connected to a semiconductor parameter analyzer.

Electrical transport measurement

All the devices, except the Al2O3 passivated MoS2 FETs, were characterized in a high vacuum-probe station system10. Fundamental electrical transport characterizations were performed using semiconductor analyzers (Keithley 4200, Agilent B1500A) with a temperature controllable probe system (335, Lake-Shore). Low-frequency noise characteristics were obtained from a home-made noise measurement system (the system details are presented in Fig. S2 in the Supplementary Materials), consisting of a home-made battery box, a low noise current-to-voltage pre-amplifier (SR570, Stanford Research Systems), and a data acquisition system (DAQ-4431, National Instruments)42.

Data processing and training

We used the Python speech features library in Github (https://github.com/jameslyons/python_speech_features) for processing of LF noise data into MFCC parameters. We only considered data where \(\overline {I_{\mathrm{D}}}\) was larger than 100 nA to avoid a possible error. The optimized combination of hyperparameters was based on the previous studied LF noise analysis, narrowed the range, and found the best result by iterating through for loop. After Augmenting training MFCCs dataset using Gaussian noise, we used hmmlearn (https://github.com/hmmlearn/hmmlearn) library in Github for using HMM trainer function with training MFCCs data. Through HMM training, trained data generated for each class were converted into score vectors, and these vectors were trained by neural network based on the Tensorflow keras (https://www.tensorflow.org/guide/keras). Finally, We learned and trained current MFCC data directly using CNN also based on the Tensorflow keras (https://www.tensorflow.org/guide/keras).

Model validation

We used the 4-fold cross-validation method to train our MFCCs dataset, training MFCCs dataset was divided into 4 subsets having equal sizes randomly. Of the 4 subsets, a single subset was retained for the test data for evaluating the model, and the remaining three subsets were used as training. Our cross-validation process is repeated 4 times, with each of the four subsets used once for test. The remained test MFCC datasets were converted into score vectors to evaluate the model with training data learned through the HMM+NN architecture. We obtained not only the accuracy, but also confusion matrix, receiver operating characteristic (ROC) curves, area under the curve (AUC) value, and f1-score to evaluate the model performance accurately with imbalance of the data (https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics).