当前期刊: Chemometrics and Intelligent Laboratory Systems Go to current issue    加入关注   
显示样式:        排序: 导出
我的关注
我的收藏
您暂时未登录!
登录
  • On the restrictiveness of equality constraints in multivariate curve resolution
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-25
    Mathias Sawall; Somaye Vali Zade; Christoph Kubis; Henning Schröder; Denise Meinhardt; Alexander Brächer; Robert Franke; Armin Börner; Hamid Abdollahi; Klaus Neymeyr

    Multivariate curve resolution methods suffer from the non-uniqueness of the solutions of the nonnegative matrix factorization problem. The solution ambiguity can be considerably reduced by equality constraints in the form of known spectra or concentration profiles. Two measures are suggested that indicate the impact of the equality constraints. The representation of these measures in the area of feasible solutions show strong variations in the restrictiveness of equality constraints. The measures are tested for a three-component model problem and experimental data sets from the hydroformylation process and a catalyst cluster formation.

    更新日期:2020-01-26
  • mdatools – R package for chemometrics
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-24
    Sergey Kucheryavskiy

    The paper describes mdatools – R package, which implements mainly basic but also some advanced chemometric methods providing a unified interface and user experience. The package was created to give a low entry level for beginners, so they can start using the implemented methods without writing much of code. While progressing, though, users can also have direct access to all computed results thus extending the package functionality by writing own code on top.

    更新日期:2020-01-24
  • An adaptive mode convolutional neural network based on bar-shaped structures and its operation modeling to complex industrial processes
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-23
    Yongjian Wang; Hongguang Li; Chu Qi

    Optimal operation modeling plays an important role in complex industrial processes; however, with the increasing complexity and high nonlinearity in industrial processes, it becomes more and more difficult to establish an accurate operation modeling using first-principles methods. In this paper, an adaptive mode convolutional neural network framework based on bar-shaped structures (BS-AMCNN) is proposed, which is a data-driven model. First, a bar-shaped structure is designed to deal with the industrial process data specifically. The bar-shaped structure can transfer the advantages of CNN on processing image data to processing industrial process data. Meanwhile, the convolution windows and pooling windows in the proposed BS-AMCNN algorithm is replaced by translation-only sliding bar-shaped windows. Therefore, the algorithm can adjust the CNN structure adaptively among three different modes depending on different process statuses. the optimal operation model can be obtained with the proposed BS-AMCNN method accordingly. An experiment on real complex industrial process, methanol production process, is carried out, which validates the effectiveness of the proposed method. The proposed method is further compared with the traditional CNN method, and the back propagation (BP) method. The results demonstrate the effectiveness of the proposed method.

    更新日期:2020-01-23
  • Honey exposed to laser-induced breakdown spectroscopy for chaos-based botanical classification and fraud assessment
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-22
    Miguel Lastra-Mejías; Manuel Izquierdo; Ester González-Flores; John C. Cancilla; Jesús G. Izquierdo; José S. Torrecilla

    Given that honey is among the top ten foods with the highest adulteration rate in the European Union, in this research, a tool has been developed to tackle this malpractice. The combination of laser-induced breakdown spectroscopy (LIBS) and chaotic parameters has been employed to classify six European honeys of different botanical origins as well as detect samples containing the usually elusive rice syrup adulteration in weight concentrations as low as 2 %. The profiles of the LIBS emission spectra can be used to faithfully classify honey in terms of botanical origin by combining information extracted directly from the spectra with simple linear modeling. In contrast, the detection of low amounts of rice syrup in honey is not as straightforward, which is why algorithms based on chaotic parameters such as shifted (lag-k) autocorrelation coefficients were employed to extract underlying information representative of adulterated samples. Since these algorithms are capable of detecting slight changes in the composition of honeys, it has been possible to identify these adulterations with a success rate greater than 90 % when samples from honeys of different botanical origins are combined into the same model, and over 95 % when individual honey types are analyzed.

    更新日期:2020-01-23
  • Acid–base equilibrium of guttiferone-A in ethanol–water mixtures: Modeling and bootstrap-based evaluation of uncertainties
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-20
    Éderson D’M. Costa; Eric B. Ferreira; Dayana A. Rodrigues; Marcelo H. dos Santos

    Acid dissociation constants are important parameters for indicating the extent of dissociation at different pH values, which is directly reflected in the absorption and elimination of drugs. In this present study, the acid–base equilibrium of guttiferone-A — an important natural benzophenone with innumerable biological activities — was investigated. The pKa values were determined, via the spectrophotometric method, in six different ethanol–water mixtures, and a significant linear correlation for pKa2 as a function of the alcohol percentage was found. A proposition was made for the order of the dissociation sites of the guttiferone-A, based on the value of pKa for the 7-epiclusianone. The modeling and the study of sensitivity enabled the molar absorptivity spectra as well as the regions of highest sensitivity for inversion of the constants to be obtained. Complementing this study, the bootstrap of residuals technique was evaluated for assessment of the confidence intervals for the fitted constants — the results were equivalent to those encountered via the parametric method.

    更新日期:2020-01-21
  • On the potential and limitations of multivariate curve resolution in Mӧssbauer spectroscopic studies
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-18
    Bruno Debus; Vitaly Panchuk; Boris Gusev; Sergey Savinov; Vadim Popkov; Andrey Legin; Valentin Semenov; Dmitry Kirsanov

    Traditional processing of Mӧssbauer spectroscopy measurements assumes a decomposition of the spectra into separate multiplets corresponding to particular non-equivalent states of the resonance atom. When the number of spectra is large (e.g. in kinetic, corrosion and phase transition studies), this procedure becomes time-consuming. Moreover, traditional processing assumes some hypotheses on the number of non-equivalent states and initial multiplet parameters. The results of the processing strongly depend on these hypotheses and may be quite subjective. In an attempt to circumvent this issue, we studied the potential of Multivariate curve resolution (MCR) to unravel mixed multiplets spectra into their individual contributions. The application of MCR to Mӧssbauer studies was found to be quite challenging due to 1) long acquisition times limiting the number of available samples, 2) presence of critical spectral overlaps and 3) occasional deviations from the ideal bilinear assumption. In this report, we show how these limitations can be circumvented under certain conditions.

    更新日期:2020-01-21
  • Performance comparison of sampling designs for quality and safety control of raw materials in bulk: a simulation study based on NIR spectral data and geostatistical analysis
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-16
    J.A. Adame-Siles; J.E. Guerrero-Ginel; T. Fearn; A. Garrido-Varo; D. Pérez-Marín

    This study exploits the potential of near infrared (NIR) spectroscopy to deliver a measurement for each sampling point. Furthermore, it provides a protocol for the modelling of the spatial pattern of analytical constituents. On the basis of these two aspects, the methodology proposed in this work offers an opportunity to provide a real-time monitoring system to evaluate raw materials, easing and optimising the existing procedures for sampling and analysing products transported in bulk. In this paper, Processed Animal Proteins (PAPs) were selected as case study, and two types of quality/safety issues were tested in PAP lots —induced by moisture and cross-contamination. A simulation study, based on geostatistical analysis and the use of a set of sampling protocols, made a qualitative analysis possible to compare the representation of the spatial surfaces produced by each design. Moreover, the Root Mean Square Error of Prediction (RMSEP), calculated from the differences between the analytical values and the geostatistical predictions at unsampled locations, was used to measure the performance in each case. Results show the high sensitivity of the process to the sampling plan used — understood as the sampling design plus the sampling intensity. In general, a gradual decrease in the performance can be observed as the sampling intensity decreases, so that unlike for higher intensities, the too low ones resulted in oversmoothed surfaces which did not manage to represent the actual distribution. Overall, Stratified and Simple Random samplings achieved the best results in most cases. This indicated that an optimal balance between the design and the intensity of the sampling plan is imperative to perform this methodology.

    更新日期:2020-01-16
  • Constructing response surface designs with orthogonal quadratic effects using cyclic generators
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-15
    Tung-Dinh Pham; Nam-Ky Nguyen; Cuong-Manh Tran; Mai Phuong Vuong

    The central composite designs (CCDs [1]; and small composite designs (SCDs [2,3]; are designs for sequential experimentation for response surface optimization. The CCDs for fitting the second-order response surface require a 2-level factorial or a resolution V fraction at the first stage (screening stage). The SCDs developed for fitting the same model require many fewer runs at the first stage as they only require a resolution III* fraction. This paper introduces an algorithm which can augment a 2-level first-order design with additional 3-level runs to form a second-order design. This algorithm does not require the 2-level first-order design in stage I to be a resolution V or resolution III* fraction. These augmented runs are made up of circulant matrices. Since CCDs and SCDs are special cases of the designs constructed this way, we call the new designs generalized composite designs or GCDs. Like CCDs and SCDs, GCDs have orthogonal quadratic effects. GCDs can often be found with numbers of runs between those of SCDs and CCDs. This is useful because SCDs often have poorly estimated parameters and CCDs often require substantially more runs than required to fit a full quadratic model.

    更新日期:2020-01-15
  • Essential processing methods of hyperspectral images of agricultural and food products
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-11
    Beibei Jia; Wei Wang; Xinzhi Ni; Kurt C. Lawrence; Hong Zhuang; Seung-Chul Yoon; Zhixian Gao

    Hyperspectral images integrate spatial and spectral details together. They can provide valuable information about both external physical and internal chemical characteristics of agricultural and food products rapidly and non-destructively. Despite rapid improvements in instruments and acquisition techniques, the collected high-quality hyperspectral images still contain much useless information, like uneven illumination, background, specular reflection, and bad pixels that need to be removed. That is, hyperspectral image preprocessing is necessary for almost each hyperspectral image to get pure images or pixels, or to reduce negative influences on the subsequent detection, classification, and prediction analysis. This manuscript will enumerate some possible solutions to deal with issues mentioned above before further image analyzing. The advantages and disadvantages of different methods when dealing with a specific problem are also discussed. Obtained clean images or pure signals can be used for further data analysis. Finally, post-processing of hyperspectral images can be carried out to enhance the classification result of images or to generate chemical images/distribution maps to show spatial component concentration distributions of non-homogeneous samples.

    更新日期:2020-01-13
  • A partition-based variable selection in partial least squares regression
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-10
    Chuan-Quan Li; Zhaoyu Fang; Qing-Song Xu

    Partial least squares regression is one of the most popular modelling approaches for predicting spectral data and identifying key wavelengths when combining with many variable selection methods. But some traditional variable selection approaches often overlook the local or group information between the covariates. In this paper, a partition-based variable selection in partial least squares (PARPLS) method is proposed. It first uses the k-means algorithm to part the variable space and then estimates the coefficients in each group. Finally, these coefficients are sorted to select the important variables. The results on three near-infrared (NIR) spectroscopy datasets show that the PARPLS is able to obtain better prediction performance and select more effective variables than its competitors.

    更新日期:2020-01-11
  • Rapid discrimination of Salvia miltiorrhiza according to their geographical regions by laser induced breakdown spectroscopy (LIBS) and particle swarm optimization-kernel extreme learning machine (PSO-KELM)
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-07
    Jing Liang; Chunhua Yan; Ying Zhang; Tianlong Zhang; Xiaohui Zheng; Hua Li

    Laser-induced breakdown spectroscopy (LIBS) coupled with particle swarm optimization-kernel extreme learning machine (PSO-KELM) method was developed for classification and identification of six types Salvia miltiorrhiza samples in different regions. The spectral data of 15 Salvia miltiorrhiza samples were collected by LIBS spectrometer. An unsupervised classification model based on principal components analysis (PCA) was employed first for the classification of Salvia miltiorrhiza in different regions. The results showed that only Salvia miltiorrhiza samples from Gansu and Sichuan Province can be easily distinguished, and the samples in other regions present a bigger challenge in classification based on PCA. A supervised classification model based on KELM was then developed for the classification of Salvia miltiorrhiza, and two methods of random forest (RF) and PSO were used as the variable selection method to eliminate useless information and improve classification ability of the KELM model. The results showed that PSO-KELM model has a better classification result with a classification accuracy of 94.87%. Comparing the results with that obtained by particle swarm optimization-least squares support vector machines (PSO-LSSVM) and PSO-RF model, the PSO-KELM model possess the best classification performance. The overall results demonstrate that LIBS technique combined with PSO-KELM method would be a promising method for classification and identification of Salvia miltiorrhiza samples in different regions.

    更新日期:2020-01-07
  • A spatial-temporal LWPLS for adaptive soft sensor modeling and its application for an industrial hydrocracking process
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-03
    Xiaofeng Yuan; Jiao Zhou; Yalin Wang

    Locally weighted partial least squares (LWPLS) is a widely used just-in-time learning (JITL) modeling algorithm for adaptive soft sensor development. In LWPLS, spatial variable distance is used to measure similarity and assign weights for historical samples, which is very effective to handle process time-varying problems of abrupt changes. However, the gradual process changes are not effectively handled in traditional LWPLS. To cope with this problem, a novel similarity is proposed for temporal distance measurement by introducing a temporal variable of sampling instant, in which newest sampled data can get large weights since they represent the more recent process state. Then, both spatial and temporal similarities are considered to construct a spatial-temporal adaptive LWPLS modeling framework in this paper. The effectiveness of the proposed algorithm is validated on an industrial hydrocracking process.

    更新日期:2020-01-04
  • A Deep Learning Just-in-Time Modeling Approach for Soft Sensor Based on Variational Autoencoder
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2020-01-03
    Fan Guo; Ruimin Xie; Biao Huang

    This paper presents a variational autoencoder-based just-in-time (JIT) learning framework for soft sensor modeling. Just-in-Time learning is often applied for soft sensor modeling in industrial processes. However, traditional just-in-time learning methods measure the similarity based on Euclidean distance, which has not taken into consideration the uncertainty in variables. To improve traditional just-in-time learning methods, in the proposed approach, the variational autoencoder is employed to extract features from input data set containing noise. Each feature variable is expressed by a Gaussian distribution. Then, by using the distribution of each feature variable, Kullback-Leibler divergence is employed to evaluate the similarity between the historical samples and a query sample. Furthermore, historical samples that are most similar to the query samples based on the values of the Kullback-Leibler divergence are selected for modeling. Finally, Gaussian process regression as a nonlinear regression model, is used to model the relationship between the selected input samples and the corresponding output samples, and then make a prediction. A numerical example as well as application on a practical debutanizer industrial process demonstrates the effectiveness of the proposed method.

    更新日期:2020-01-04
  • Revealing informative metabolites with random variable combination based on model population analysis for metabolomics data
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-30
    Yong-Huan Yun; Jiachao Zhang; Haiming Chen; Wenxue Chen; Qiuping Zhong; Weimin Zhang; Weijun Chen

    The discovery of biomarker is a critical and essential step in metabolomics research. With the increasing complexity of metabolomics data generated by high resolution instruments, it is always an urgent need for chemometricians or statisticians to develop a method to efficiently reveal informative metabolites (variables). Based on the framework of model population analysis, a strategy coupled with partial least squares discriminant analysis (PLS-DA), called revealing informative metabolites iteratively (RIMI), was proposed in this study. For the sake of considering the synergetic effect of multiple variables, a vast population of random variable combinations are generated. It is worth pointing out that only the variable combinations with higher model accuracy are used to make paired models in order to statistically assess the importance of each variable in accordance with its beneficial contribution to classification model performance. Four types of variables which include strongly informative, weakly informative, noise and interfering variables, are then identified based on the difference and its significance of the area under the receiver operating characteristic curve (AUROC) values of exclusion and inclusion of each variable. With this definition, unbeneficial variables, including noise and interfering variables, were eliminated iteratively in a mild way. Strongly and weakly informative variables regarded as beneficial variables, are retained, and their P values of t-test are used to reveal the best variable subset. Due to the advantage in exploring useful information from a vast number of variable combinations with good performance, when applied to two metabolomics datasets, RIMI has greatly improved the accuracy value of classification model compared to other methods as the results show. It is indicated that RIMI has efficiently revealed informative metabolites and is regarded as a good alternative for biomarker discovery.

    更新日期:2019-12-31
  • RBPro-RF: Use Chou's 5-steps rule to predicting RNA-binding proteins via random forest with elastic net
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-29
    Xiaomeng Sun; Tingyu Jin; Cheng Chen; Xiaowen Cui; Qin Ma; Bin Yu

    RNA-proteins interaction is essential for the regulation of gene expression, cell defense and developmental regulation and other life activities, so applying machine learning to predict RNA-binding proteins (RBPs) has become a research hotspot in bioinformatics. We propose a new method to predict RNA-binding proteins called RBPro-RF. First, the feature vectors of the protein sequence are extracted by fusing composition-transition-distribution (C-T-D), pseudo-amino acid composition (PseAAC) and position-specific scoring matrix-400 (PSSM-400). Secondly, the synthetic minority oversampling technique (SMOTE) and the edited nearest neighbor (ENN) are employed to balance samples. Then, elastic net (EN) is used to eliminate redundant features and retain the important features to represent RBPs. Finally, the optimal feature vectors are input into random forest classifier to predict RBPs. Ten-fold cross-validation indicates the ACC and MCC of the training set are 97.43% and 0.933, respectively. In addition, the accuracies of three independent test sets Human, S. cerevisiae and A. thaliana are 95.55%, 88.82%, and 92.20%, respectively, which are superior to the state-of-the-art prediction methods. In summary, experimental results show that our method can significantly improve the accuracy of RNA-binding proteins prediction. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/RBPro-RF/.

    更新日期:2019-12-29
  • Transfer learning based on incorporating source knowledge using Gaussian process models for quick modeling of dynamic target processes
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-27
    Kuanglei Wang; Junghui Chen; Lei Xie; Hongye Su

    To maintain optimum economic process performance, a good process model is the cornerstone of an optimal scheduling strategy and controller design. Up to now, approaches to dynamic modeling have already been studied, but the models they constructed are only valid in their corresponding operating conditions. As operating conditions switch fast during the production, the constructed model may lack the extrapolating capability and may not describe the process behaviors in the new operating condition properly. Only a small number of data can be collected from the new operating condition for the construction of the model; the performance of the model may not be guaranteed for online new data. In this paper, a dynamic transfer modeling approach based on the Gaussian process model (GPM) is proposed. It can quickly model the target process and get correct predictions, by transferring source model knowledge trained with a sufficient number of historical data to a target model with a small number of available target data. This can significantly reduce the amount of time waiting for getting the target process data and quickly achieve a good process model. The statistical approach leverages GPM to transfer the knowledge. GPM is introduced to capture the uncertainty that propagates from the source process to the target process. Thus, the multi-step ahead prediction of the target model can provide the mean prediction as well as probabilistic information for its prediction in the form of a predictive variance. Finally, CSTR and the real furnace system are used to demonstrate the features of the proposed method and the applicability to a real plant process.

    更新日期:2019-12-29
  • Chagas disease vectors identification using visible and near-infrared spectroscopy
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-27
    Stéphanie Depickère; Antonio G. Ravelo-García; Frédéric Lardeux

    Chagas disease, caused by the parasite Trypanosoma cruzi, is widespread in Latin America, where the disease remains one of the major public health problems. This condition is mostly transmitted by triatomines which are haematophagous insects all their life. With 154 species described in the world, the correct determination of the species involved in the transmission is crucial to develop efficient control strategies. This can be achieved by taxonomic keys (available only for adult stages, nymphal instars must be reared), or by molecular techniques. Both are time and/or money consuming, showing the needs of new identification tools, especially for nymphal instars which are the most frequently found on the field. Visible and near-infrared spectroscopy (VIS-NIR), used successfully these last years in various organisms’ determination, was applied on a sample of three species from Bolivia: Triatoma infestans, Triatoma sordida and Triatoma guasayana. The spectrum of the dorsal part of the head from nymphal instars and adult stages was taken for each specimen of each species. Different methods of pre-processing and selection of variables (wavelengths) were tested to find the best model of classification for the three species. Each model was evaluated by different indices: accuracy, specificity, and F1 score. The comparison of the performance of each model evidenced that the best results were obtained when using a short spectrum (400–2000 nm) without pre-processing. A total of 32 components were retained by tuning, and 933 wavelengths were kept by the backward feature selection algorithm. Applying it on a new sample of insects, this model showed a global accuracy of 97.2% (95.0–98.6). The F1 score was greater than 0.95, and the specificity greater than 0.94 for all the species. For the first time, a tool is available to quickly identify and with a high accuracy nymphal instars and adults of triatomines.

    更新日期:2019-12-27
  • Bilinear and trilinear modelling of three-way data obtained in two factor designed metabolomics studies
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-27
    Jamile Mohammad Jafari; Hamid Abdollahi; Romà Tauler

    Metabolomic studies of biological samples using experimentally designed experiments at different levels produce large multivariate datasets which can be arranged in three-way datasets and modelled using bilinear and trilinear factor decomposition methods. The goal of these studies is the discovery of the hidden sources of data variability to facilitate their biochemical interpretation. In this paper, the relationship between the effects of the experimental design factors, the structure of the generated three-way datasets and their more appropriate modelling (bilinear or trilinear) are investigated. As example of study, the effects of the dose of a chemical drug on the changes over time in the concentration of lipids in multiple samples of a biological organism are investigated in detail. Different scenarios are considered depending on the type of effects and interactions between the experimental factors. The optimal data modelling results are obtained in case of having reproducible multiplicative effects between the experimental design factors, because in this case the data decomposition can be performed using a trilinear model and the correct lipid profiles are recovered. In the other data scenarios, even in the presence of only additive effects and no interaction between design factors, the correct recovery of the different lipid profiles describing the behavior of the system is not guaranteed and the subsequent rotation ambiguities associated to the bilinear model decompositions are still present.

    更新日期:2019-12-27
  • A selective ensemble preprocessing strategy for near-infrared spectral quantitative analysis of complex samples
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-25
    Xihui Bian; Kaiyi Wang; Erxuan Tan; Pengyao Diwu; Fei Zhang; Yugao Guo

    Preprocessing of raw near-infrared (NIR) spectra is typically required prior to multivariate calibration since the measured spectra of complex samples are often subject to overwhelming background, light scattering, varying noises and other unexpected factors. Various preprocessing methods have been developed aimed at removing or reducing the interference of these effects. However, it is usually difficult to determine the best preprocessing method for a given data. Instead of selecting the best one, a selective ensemble preprocessing strategy is proposed for NIR spectral quantitative analysis. Firstly, numerous preprocessing methods and their combinations are obtained by full factorial design in order of baseline correction, scattering correction, smoothing and scaling. Then partial least squares (PLS) model is built for each preprocessing method. The models which have better predictions than PLS are selected and their predictions are averaged as the final prediction. The performance of the proposed method was tested with corn, blood and edible blend oil samples. Results demonstrate that the selective ensemble preprocessing method can give comparative or even better results than the traditional selected best preprocessing method. Therefore, in the framework of selective ensemble preprocessing, more accurate calibration can be obtained without searching the best preprocessing method.

    更新日期:2019-12-26
  • Colourgrams GUI: a graphical user-friendly interface for the analysis of large datasets of RGB images
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-24
    Rosalba Calvini; Giorgia Orlandi; Giorgia Foca; Alessandro Ulrici

    Colourgrams GUI is a graphical user-friendly interface developed in order to facilitate the analysis of large datasets of RGB images through the colourgrams approach. Briefly, the colourgrams approach consists in converting a dataset of RGB images into a matrix of one-dimensional signals, the colourgrams, each one codifying the colour content of the corresponding original image. This matrix of signals can be in turn analysed by means of common multivariate statistical methods, such as Principal Component Analysis (PCA) for exploratory analysis of the image dataset, or Partial Least Squares (PLS) regression for the quantification of colour-related properties of interest. Colourgrams GUI allows to easily convert the dataset of RGB images into the colourgrams matrix, to interactively visualize the signals coloured according to qualitative and/or quantitative properties of the corresponding samples and to visualize the colour features corresponding to selected colourgram regions into the image domain. In addition, the software also allows to analyse the colourgrams matrix by means of PCA and PLS.

    更新日期:2019-12-25
  • Deep learning for geographical discrimination of Panax notoginseng with directly near-infrared spectra image
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-24
    Jian-E. Dong; Ye Wang; Zhi-Tian Zuo; Yuan-Zhong Wang

    Herbal materials have been widely used as functional food by a certain group of people for a potentially positive effect on body health regulation. Panax notoginseng as a crude material of functional food has long medical and cultivation history for more than 400 years in China and other countries. However, the quality was fluctuated with their geographical origins and Wenshan Autonomous Prefecture was regarded as the geo-authentic location with high properties. Therefore, rapid detection method is necessary for consumer to discriminate their authentic origins. In our study, 258 near infrared spectra of root powder of P. notoginseng from five main cultivation areas were used for discrimination analysis. A deep learning strategy (residual convolutional neural network) was established with 80% spectra images. Therein, the discrimination of geographical origins of the herb was first to be reported using directly spectra images instead of data matric from these spectra. The results indicated that these samples could be correctly classified as their respective categories with 100% accuracy in training set and 91% accuracy in test set. Finally, 22 samples were accurately discriminated in 25 samples of prediction set. In general, residual convolutional neural network using direct spectra image would be a feasible strategy for geographical traceability in further discrimination research.

    更新日期:2019-12-25
  • cACP: Classifying anticancer peptides using discriminative intelligent model via Chou's 5-step rules and general pseudo components
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-20
    Shahid Akbar; Ateeq Ur Rahman; Maqsood Hayat; Mohammad Sohail

    World widely, cancer is considered a fatal disease and remains the major cause of death. Conventional medication approaches using therapies and anticancer drugs are deemed ineffective due to its high cost and harmful impacts on the normal cells. However, the innovation of anticancer peptides (ACPs) provides an effective way how to deals with cancer affected cells. Due to the rapid increases in peptide sequences, truly characterization of ACPs has become a challenging task for investigators. In this paper, an effort has been carried out to develop a reliable and intelligent computational method for the accurate discrimination of anticancer peptides. Three statistical feature representation schemes namely: Quasi-sequence order (QSO), conjoint triad feature, and Geary autocorrelation descriptor are applied to express motif of the target class. In order to eradicate irrelevant and noisy features, while select salient, profound and high variated features, principal component analysis is employed. Furthermore, the diverse nature of learning algorithms is utilized in order to select the best operational engine for the proposed model. After examining the empirical outcomes, support vector machine obtained quite encouraging results in combination with QSO feature space. It has achieved an accuracy of 96.91% and 89.54% using the main dataset and alternative dataset, respectively. It is observed that our proposed model shows an outstanding improvement compared to literature methods. It is expected that the developed model may be played a useful role in research academia as well as proteomics and drug development.

    更新日期:2019-12-21
  • Continuous statistical modelling in characterisation of complex hydrocolloid mixtures using near infrared spectroscopy
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-20
    Konstantia Georgouli; Beatriz Carrasco; Damien Vincke; Jesus Martinez Del Rincon; Anastasios Koidis; Vincent Baeten; Juan Antonio Fernández Pierna

    Hydrocolloids such as natural gums and carrageenans are used extensively in the food industry in various mixtures that are difficult to be characterised due to their similar chemical structure. The aim of this study was to develop an analytical framework for the identification and quantification of these compounds in complex mixtures using Near-infrared (NIR) spectroscopy and chemometrics. Partial Least Squares (PLS) regression accompanied by Continuous Locality Preserving Projections (CLPP) dimensionality reduction technique is proposed as chemometric framework. Four different analytical models based on this framework are developed and compared for the analysis of spectral fingerprints of food hydrocolloids mixtures. Classification results showed that this method allowed the discrimination of hydrocolloids in blends with a 100% of correct classification. The same scheme also allows the quantitative determination of the different types of food hydrocolloids (3 types) and/or their individual compounds (8 different compounds) with a relative low root mean square error of prediction (RMSEP) of 0.028 and 0.038 respectively.

    更新日期:2019-12-20
  • All sparse PCA models are wrong, but some are useful. Part I: Computation of scores, residuals and explained variance
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-19
    J. Camacho; A.K. Smilde; E. Saccenti; J.A. Westerhuis

    Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA) that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. When moving from PCA to sPCA, there are a number of implications that the practitioner needs to be aware of. A relevant one is that scores and loadings in sPCA may not be orthogonal. For this reason, the traditional way of computing scores, residuals and variance explained that is used in the classical PCA can lead to unexpected properties and therefore incorrect interpretations in sPCA. This also affects how sPCA components should be visualized. In this paper we illustrate this problem both theoretically and numerically using simulations for several state-of-the-art sPCA algorithms, and provide proper computation of the different elements mentioned. We show that sPCA approaches present disparate and limited performance when modeling noise-free, sparse data. In a follow-up paper, we discuss the theoretical properties that lead to this undesired behavior. We title this series of papers after the famous phrase of George Box “All models are wrong, but some are useful” with the same original meaning: sPCA models are only approximations of reality and have structural limitations that should be taken into account by the practitioner, but properly applied they can be useful tools to understand data.

    更新日期:2019-12-20
  • Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-19
    Guang-Hui Fu; Yuan-Jiao Wu; Min-Jie Zong; Lun-Zhao Yi

    Learning from class-imbalanced data has gained increasing attention in recent years due to the massive growth of skewed data across many scientific fields such as metabolomics. Some researches show that it is not the imbalance itself which hinders the classification performance, but class overlapping do play an important role in the performance degradation when associated to class-imbalance. So alleviating the overlapping of the imbalanced data might be an effective way to improve the performance in class-imbalance learning. In this study, we propose two feature selection algorithms that aim to minimize the overlap degree between the majority and the minority, which is based on a simple assumption that decreasing overlap degree of a data set makes it more separable. The proposed MOSNS and MOSS methods are built via sparse regularization techniques. Simulation results indicate that our algorithms is effective in recognizing key features and control false discoveries for class-imbalance learning. Four class-imbalanced metabolomics data sets are also employed to test the performance of our algorithm, and a comparison with accuracy (ACC)-based and ROC-based selection procedures is performed. The result shows that our algorithms are highly competitive and can be an alternative feature selection strategy in class-imbalance learning.

    更新日期:2019-12-19
  • Weighted incremental minimax probability machine-based method for quality prediction in gasoline blending process
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-19
    Kaixun He; Maiying Zhong; Wenli Du

    Near-infrared (NIR) spectroscopy is frequently used to predict quality-relevant variables that are difficult to measure online. This technology can be applied by developing the NIR model in advance. Obtaining a high-accuracy NIR model is difficult using traditional modeling methods because process data inherently contain uncertainties and present strong non-Gaussian characteristics. Considering the difficulty in obtaining precise prediction results, biased estimation is important in producing qualified products when NIR spectroscopy is used in a feedback quality control system. The present work proposes a biased estimation model based on probabilistic representation to address the aforementioned issues. Additionally, a novel weighted incremental strategy with “just-in-time” learning is proposed to improve model adaptiveness. In this way, the NIR model could be established and maintained without imposing any distribution hypothesis on process data, and biased estimation could be obtained in the form of probability. The performance of the proposed method is demonstrated on an actual data set from a gasoline blending process.

    更新日期:2019-12-19
  • Improved linear profiling methods under classical and Bayesian setups: An application to chemical gas sensors
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-12
    Tahir Abbas; Tahir Mahmood; Muhammad Riaz; Muhammad Abid

    A profile is a functional relationship, between two or more variables, used to monitor the process performance and its quality. The relationship may be linear or nonlinear depending upon the situation. Linear profiling methods with a fixed-effect model are commonly used under simple random sampling (SRS). In this article, we propose linear profiles monitoring methods under a new ranked set sampling (RSS) scheme named as Neoteric RSS (NRSS). The new profiling methods are proposed under all the three popular structures, namely Shewhart, cumulative sum (CUSUM) and exponentially weighted moving average (EWMA). The study proposal considers both classical and Bayesian setups. We have investigated the detection ability of newly proposed classical charts (i.e., Shewhart_NRSS(C), CUSUM_NRSS(C), EWMA_NRSS(C) charts) and Bayesian charts (i.e., Shewhart_NRSS(B), CUSUM_NRSS(B) and EWMA_NRSS(B) charts). An extensive simulation study showed that the proposed charts have better detection ability for perfect NRSS scheme, while Bayesian control charts showed superiority over its classical counterpart under both perfect and imperfect NRSS. The significance of the proposed study is further highlighted using the real data study of chemical gas sensors from the chemical industry.

    更新日期:2019-12-13
  • Multi-block SO-PLS approach based on infrared spectroscopy for anaerobic digestion process monitoring
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-12-09
    L. Awhangbo, R. Bendoula, J.M. Roger, F. Béline

    Near infrared spectroscopy combined with multivariate calibration such as partial least squares regression is a promising technique for on-line monitoring of anaerobic digesters. Different substrates are used in digesters, depending on their availability and their methanogen potential, to optimize the process. In Europe, the feedstock for anaerobic digesters is dominated by slurry and food waste which are respectively highly biodegradable and fat-containing substrates. The monitoring of the anaerobic digestion process based on digestates coming from these substrates presents some difficulties. The digestion of highly biodegradable substrates comes with the presence of water, which hinders spectroscopic calibration. And fat-containing substrates could lead to the accumulation of long chain fatty acids which are quite difficult to detect in the infrared region. While all existing studies have explored adapted spectroscopic measurements to improve the process monitoring, this study investigated the use of NIRS combined with multi-block analysis to track important anaerobic digestion stability parameters. Infrared measurements can come from several sources in the process monitoring. In addition, sequential and orthogonalized partial least squares have proven their ability of exploiting the underlying relation between several data blocks. These multi-block methods are powerful chemometric tools which can be applied in the monitoring of anaerobic digestion. Polarization light spectroscopy which is also known to improve the comprehension of scattering media like the digestate was also studied.

    更新日期:2019-12-09
  • Probabilistic just-in-time approach for nonlinear modeling with Bayesian nonlinear feature extraction
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-25
    Feifeng Shen, Nabil Magbool Jan, Biao Huang, Huizhong Yang

    In this work, we propose a probabilistic just-in-time (PJIT) modeling methodology with nonlinear feature extraction for estimating quality variables of interest. In literature, deterministic nonlinear feature extraction methods have been employed to deal with high dimensional input data. However, these methods require prespecifying the latent dimensions, which often results in overfitting. To circumvent this issue, we employ the Bayesian Gaussian process latent variable model (BGPLVM) to extract nonlinear latent variables and determine their dimensions automatically. Owing to the probabilistic framework, the proposed approach involves computing the variational distribution of latent variables for the query sample as well as historical samples, and selecting relevant samples based on a distribution measure for building a local Gaussian process model to predict the quality variable. Furthermore, the applicability of the proposed approach to missing data and multi-rate data is discussed. Two case studies are presented to demonstrate the efficacy of the proposed PJIT model.

    更新日期:2019-11-26
  • Linear programming applied to polarized Raman spectroscopy for elucidating molecular structure at surfaces
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-25
    Fei Chen, Kuo-Kai Hung, Dennis K. Hore, Ulrike Stege

    We present a framework for using linear programming to solve a challenging problem in surface science, the elucidation of the structure and composition of adsorbed molecules from a mixture, using simulated data from polarized Raman experiments. In the past, methods applied in order to interpret such spectroscopic information were combinatorial approaches that are limited in scalability or accuracy. Quantum mechanical electronic structure calculations yield the optical response of a single molecule, from which spectra of a mixture can be determined by appropriate weighting. Furthermore, spectral obtained in different beam polarizations provide projections of the signal in the laboratory frame. We demonstrate that linear programming is an ideal tool for utilizing all of this information in order to provide the sought structural picture.

    更新日期:2019-11-26
  • Determination of the effect of red blood cell parameters in the discrimination of iron deficiency anemia and beta thalassemia via Neighborhood Component Analysis Feature Selection-Based machine learning
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-18
    Hakan Ayyildiz, Seda Arslan Tuncer

    Differential diagnosis of iron deficiency anemia (IDA) and β-thalassemia is a time-taking and costly procedure. Complete blood count (CBC) is a quick, inexpensive, and easily accessible test which is used as the primary test for the diagnosis of anemia. However, as CBC cannot successfully discriminate between IDA and β-thalassemia, advanced techniques are needed. To date, numerous red blood cell (RBC) indices have been investigated and various parameters have been proposed for each index. In the present study, a differential diagnosis of IDA and β-thalassemia was performed by using RBC indices and machine learning techniques including Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). The RBC indices were used as input parameters for the classifier and the performances of SVM and KNN were evaluated separately, in order to determine the effectivity of both techniques. Fewer parameters were given as an inputs to machine learning algorithms, and higher performance was achieved. On the other hand, a feature selection technique, the Neighborhood Component Analysis Feature Selection (NCA) algorithm, was used for selecting features from the datasets, and the parameters selected via NCA provided high performance (97% Area Under the ROC curve [AUC]). Taken together, the results indicated that the RBC indices used in the study showed higher performance compared to those reported in the literature. By using these indices, not only the individual effect of each index parameter on the machine learning model was investigated but also a different subset of features from those employed in the literature was established. In addition, as distinct from the literature, the study revealed that different CBC parameters were efficient in distinguishing between IDA and β-thalassemia in male and female patients. Accordingly, the RBC indices employed in the study can be easily and inexpensively used in clinical and daily practice for the discrimination of IDA and β-thalassemia.

    更新日期:2019-11-18
  • Does the signal contribution function attain its extrema on the boundary of the area of feasible solutions?
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-16
    Klaus Neymeyr, Azadeh Golshan, Konrad Engel, Romà Tauler, Mathias Sawall

    The signal contribution function (SCF) was introduced by Gemperline in 1999 and Tauler in 2001 in order to study band boundaries of multivariate curve resolution (MCR) methods. In 2010 Rajkó pointed out that the extremal profiles of the SCF reproduce the limiting profiles of the Lawton-Sylvestre plots for the case of noise-free two-component systems. This paper mathematically investigates two-component systems and includes a self-contained proof of the SCF-boundary property for two-component systems. It also answers the question if a comparable behavior of the SCF still holds for chemical systems with three components or even more components with respect to their area of feasible solutions. A negative answer is given by presenting a noise-free three-component system for which one of the profiles maximizing the SCF is represented by a point in the interior of the associated area of feasible solutions.

    更新日期:2019-11-18
  • Variable contribution identification and visualization in multivariate statistical process monitoring
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-15
    R.F. Rossouw, R.L.J. Coetzer, N.J. Le Roux

    Multivariate statistical process monitoring (MSPM) has received book-length treatments and wide spread application in industry. In MSPM, multivariate data analysis techniques such as principal component analysis (PCA) are commonly employed to project the (possibly many) process variables onto a lower dimensional space where they are jointly monitored given a historical or specified reference set that is within statistical control. In this paper, PCA and biplots are employed together in an innovative way to develop an efficient multivariate process monitoring methodology for variable contribution identification and visualization. The methodology is applied to a commercial coal gasification production facility with multiple parallel production processes. More specifically, it is shown how the methodology is used to specify the optimal principal component combinations and biplot axes for visualization and interpretation of process performance, and for the identification of the critical variables responsible for performance deviations, which yielded direct benefits for the commercial production facility.

    更新日期:2019-11-18
  • Wavelet functional principal component analysis for batch process monitoring
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-15
    Jingxiang Liu, Junghui Chen, Dan Wang

    To facilitate the understanding and analysis of process conditions, a novel wavelet functional principal component analysis is proposed for monitoring batch processes from the functional perspective. In the proposed method, the variables' trajectories are taken as smooth functions instead of discrete vectors. To this end, the original discrete variables are transferred into continuous functions using wavelet basis functions in an active way. This can not only highlight the subtle shape differences between the normal and faulty variables trajectories but also easily address the uneven-length issue in practical batch processes. Additionally, without unfolding the operation, the 3D matrix is transferred into the functional matrix directly. The functional principal component analysis method is then performed on the functional space to establish monitoring models. Thanks to the compact-support characteristics of the wavelet functions, the proposed method can be directly applied to within-batch detection without data pre-treatment. A numerical case, a case of the simulated penicillin fermentation process, and a case of the laboratorial injection molding process are given to demonstrate the effectiveness of the proposed method.

    更新日期:2019-11-15
  • Ridge regression combined with model complexity analysis for near infrared (NIR) spectroscopic model updating
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-13
    Feiyu Zhang, Ruoqiu Zhang, Wenming Wang, Wuye Yang, Long Li, Yinran Xiong, Qidi Kang, Yiping Du

    Near infrared (NIR) calibration models can be used to predict those samples that fall into the calibration domain. However, unmodeled sources of variance within new samples, such as instrumental drift and sample variations, would result in unreliable predictions of product properties. In this case, the model updating approach will be very important. It involves the recalculation of model coefficients with the addition of a few new samples to the original calibration samples. Considering the cost of collecting new samples and their reference measurements, normally few samples are used for model updating. Therefore, it is necessary to balance the mutual importance of old and new samples by weighting the new samples. Compared with the weight of new samples, the model parameter in the regression method has much more influence on the performance of an updated model. The bias/variance tradeoff (L curve) has been applied to the selection of the model parameter. However, this approach contains a degree of subjectivity and does not always obtain satisfactory models. To solve the model selection problem, a new method named model complexity analysis (MCA) was proposed in this work. According to MCA, the 2-norm of the regression coefficients vector of an updated model ( ) should be smaller than that of the original model ( ). The ratio of over was defined as , which should be in the range of 0∼1. For a given value of , the model parameter can be uniquely determined by the following equation: The influence of the number of new samples and their representativity on the selection of was studied. In this work, ridge regression (RR) was used for model updating, because it is a regression method based on 2-norm constraint. Results show that the proposed method based on MCA could select a reliable RR parameter. RR-MCA shows excellent performance on three NIR datasets used in this work.

    更新日期:2019-11-13
  • Sparse PARAFAC2 decomposition: Application to fault detection and diagnosis in batch processes
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-12
    Lijia Luo, Yonggui Chen, Shiyi Bao, Chudong Tong

    The PARAFAC2 decomposition is often used to modeling a set of matrices that have the same number of columns but different numbers of rows. However, the PARAFAC2 model lacks of interpretability because most of elements in factor vectors are nonzero. To overcome this deficiency, a sparse PARAFAC2 (SPARAFAC2) decomposition is developed. SPARAFAC2 yields sparse factor vectors (SFVs) with only a few nonzero elements. Because of the sparsity in factor vectors, the SPARAFAC2 model has much better interpretability than the ordinary PARAFAC2 model. SPARAFAC2 is attractive for the applications in batch processes, because it not only can directly handle the three-way structure of batch data and naturally solve the unequal batch length problem, but also can reveal meaningful connections between process variables. Therefore, based on the SPARAFAC2 decomposition, fault detection and diagnosis methods are proposed for batch processes. To improve the fault detection capability, a cumulative percent contribution criterion is used to adaptively select SFVs for each sample from the fault detection point of view. Two fault detection indices are then defined using the selected SFVs. A contribution-based fault diagnosis method is also proposed. This method identifies faulty variables by evaluating contributions of SFVs and active variables (with nonzero elements) in each SFV to the detection of faults. The effectiveness of the proposed methods is demonstrated with a case study in an industrial-scale fermentation process.

    更新日期:2019-11-13
  • Data mining assisted prediction of liquidus temperature for primary crystallization of different electrolyte systems
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-11
    Hui Lu, Xiaojun Hu, Bin Cao, Liang Ma, Wanqiu Chai, Yunchuan Yang

    Liquidus temperature for primary crystallization is an important physical and chemical property for electrolyte system. It plays a crucial role on the stability of the electric cell in electrolysis production process. So how to accurately predict the liquidus temperature for primary crystallization of electrolyte based on the composition of electrolyte is a meaningful research subject. In this work, data mining assisted prediction of liquidus temperature for primary crystallization of electrolyte systems was proposed. The essential differences between the complex industrial electrolyte system and electrolyte system prepared in laboratory were revealed by means of comparing the micro-morphology, phase composition and thermal analysis. To some extent, it was verified that the empirical formula has no versatility in the two different electrolyte systems. The prediction model of liquidus temperature for primary crystallization of different electrolyte systems was constructed by using SVM(support vector machine), BPANN(back-propagation artifical neural networks), RFR(random forest regression) and GBR(gradient boosting regression) algorithm, respectively. The electroyte system inculdes Na3AlF6(CR)-Al2O3–AlF3–CaF2, Na3AlF6(CR)-Al2O3–MgF2–CaF2–LiF, Na3AlF6(CR)-Al2O3-MgF2-CaF2-KF-LiF, and Na3AlF6(CR)-Al2O3-AlF3-CaF2-MgF2-LiF-KF-NaF. For different electrolyte systems, ANN, SVM, RFR and other models all have good performances, they can effectively predict the liquidus temperature for primary crystallization of each electrolyte systems. For some electrolyte systems, ANN, SVM, RFR models are obviously superior to the prediction level of empirical formula described in the literature. It can be seen that data mining has a good application prospect in the prediction of the liquidus temperature for primary crystallization of electrolyte systems. We provide a new method for predicting the liquidus temperature for primary crystallization of different electrolyte systems based on the electrolyte composition dataset in this work.

    更新日期:2019-11-13
  • Identification and visualization of cell subgroups in uncompensated flow cytometry data
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-09
    Başak Esin Köktürk Güzel, Bilge Karaçali

    We propose a new method for identification and visualization of cell-sub groups in uncompensated multi-color flow cytometry data. The method combines annealing-based model-free expectation-maximization to identify cell sub-groups and joint diagonalization on clustered data for better visualization. The proposed method was evaluated on a real, publicly available 8-color flow cytometry dataset manually gated beforehand for lymphocytes. The results obtained in three separable scenarios indicate that the method accurately identifies cell subgroups while properly adjusting visualization of identified cell groups by reducing the spectral overlap between the different fluorochrome channels.

    更新日期:2019-11-11
  • iPredCNC: Computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-02
    Zaheer Ullah Khan, Farman Ali, Irfan Ahmad, Maqsood Hayat, Dechang Pi

    Lectins are special types of protein that play a crucial role in tumor cell differentiation due to their significant binding affinity to certain types of saccharide (carbohydrate) groups. They are also closely related to certain types of proteins that initiate tumor cell survival, growth, metastasis, carcinoma, and different stages of tumor. Differentiating the specific functions of proteins remains challenging in the post-genomic era. This endeavor is vital in therapeutic cancer studies, but web-lab experiments related to this issue are expensive and time-consuming. To cope with this situation, several computational sequence-based methods have been proposed to differentiate the specific functions of proteins. In the current study, we have developed a fast-accurate cascade feature selection-based machine learning model for cancer lectins using different sequence-based feature descriptive techniques. This model yielded 85.21% accuracy, 87.84% sensitivity, 81.92% specificity, and 0.922 AUC with a multilayer perceptron over k-fold, and stratified k-fold cross-validation tests. These concrete empirical results show the authenticity and robustness of the proposed study compared to all existing approaches. This proposed novel methodology would be a handy tool in cancer therapeutics research, drug design, and academic studies. All the source codes and data regarding this manuscript are freely available via http://www.github.com/zaheeerkhancs/iPredCNC.

    更新日期:2019-11-04
  • 更新日期:2019-11-04
  • Response surface methodology for optimizing LIBS testing parameters: A case to conduct the elemental contents analysis in soil
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-01
    Keqiang Yu, Yanru Zhao, Yong He, Dongjian He

    Optimization of testing parameters are the prerequisite for laser induced breakdown spectroscopy (LIBS) further data analysis, which can offer important reference value for the soil detection in the field. This work investigated the influence of the main testing parameters laser energy (LE), delay time (DT), and lens to sample distance (LTSD) of LIBS system. Based on the spectral characteristic of main elements in soils, the testing parameters of LIBS for soil detection were obtained and verified. The optimization analysis of three testing parameters LE (50-160 mJ), DT (0.5-4.5 μs), and LTSD (94-102 mm) were conducted by response surface methodology (RSM). Central composite design (CCD) in RSM was introduced to carry out experimental runs. The combined signal-background-ratio (SBR) of characteristic spectral lines from main elements (Si, Fe, Mg, Ca, Al, Na, K, etc.) in soil were defined as the objective function (named YSBR). The interaction influences among three independent variables (LE, DT, and LTSD) on soil plasma characteristics were explored and the optimized testing parameters of LIBS were summarized. Results revealed as follows: the factor LE showed a remarkable linear effect to YSBR, and factors DT and LTSD exhibited opposite results. The interactive items of three factors displayed a non-significant relationship. Meanwhile, the quadratic items of LE2, DT2 and LTSD2 offered significant surface relationships. Through the RSM analysis, the optimized testing parameters for LIBS soil detection were LE: 103.09 mJ; DT: 2.92 μs; LTSD: 97.69 mm; and a peak value YSBR of 198.60. After that, the LIBS data of 21 representative soil samples were collected under the optimized LIBS testing parameters. Partial least squares regression (PLSR) was introduced to predict the main elemental contents. Results indicated that PLSR models offered promising outputs for predicting the contents of Al, Ca, Fe, K, Mg, and Na in the sampled soil, which revealed that the testing parameters of LIBS optimized by RSM were available. This work provided a theoretical basis for the accurate LIBS data analysis and regarded as a technical support for the field soil LIBS testing parameters selection.

    更新日期:2019-11-01
  • Quantitative structure-activity relationship (QSAR) models and their applicability domain analysis on HIV-1 protease inhibitors by machine learning methods
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-01
    Yujia Tian, Shengde Zhang, Hongyan Yin, Aixia Yan

    HIV-1 protease inhibitors (PIs) make a vital contribution on highly active antiretroviral therapy (HAART) of human immunodeficiency virus (HIV). In this study, 14 quantitative structure-activity relationship (QSAR) models on 1238 PIs were built by four machine learning methods, including multiple linear regression (MLR), support vector machine (SVM), random forest (RF) and deep neural networks (DNN). For the best model Model2G constructed by DNN algorithm, the coefficient of determination (R2) of 0.88 and 0.79, the root mean squared error (RMSE) of 0.39 and 0.51 were obtained on training set and test set, respectively. For model Model2G, the applicability domain threshold (ADT) of 1.765 was obtained for training set, a compound that has a similarity distance (d) less than the ADT is considered to be inside the applicability domain, could be predicted accurately, and thus 65.37% compounds in test set performed reliable. In addition, the 1238 PIs were manually divided into eight subsets containing different scaffolds. We also built QSAR models with DNN method on two subsets of 417 cyclic urea derivatives and 184 pyrone derivatives.

    更新日期:2019-11-01
  • Automatic segmentation method for CFU counting in single plate-serial dilution
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-11-01
    Dimitria T. Boukouvalas, Renato Araújo Prates, Cintia Raquel Lima Leal, Sidnei Alves de Araújo

    Quantification of colony forming units (CFU) on microbial cultures prepared according to the standard spread plate technique is a daily laboratory routine that requires significant resources. On the other hand, SP-SDS (Single Plate Serial Dilution Spotting) is a widely used technique that allows a great reduction in the use of material resources and time. However, previous approaches for automatic quantification are based on images of standard spread plate Petri dishes with low variation of CFU features and captured under controlled lighting conditions. In this paper, we propose a novel approach that automatically separates each dilution in images of Petri dishes prepared in the SP-SDS technique and counts total CFU per dilution, which most approaches are unable to perform. The proposed approach employs region-based shape descriptors for quantification of isolated CFU and cross-correlation granulometry for the quantification of CFU in agglomerates. For the experiments, we composed two image datasets and used images from two publicly available datasets. The images from our datasets were acquired under real laboratory ambient conditions and show variation in lighting, background noise, low contrast between bacterial colonies and background, and high variation in CFU features. Overall, the results obtained by our approach in terms of accuracy, precision, and sensitivity were superior to those of two other approaches recently proposed in the literature used for comparison in this study, especially for high-definition images. In addition, our results present greater or similar accuracy to various approaches found in the literature, most of which are not able to count CFU in images obtained from Petri dishes prepared in the SP-SDS technique and low control of ambient conditions. Our composed datasets are publicly available for download as a contribution to further research.

    更新日期:2019-11-01
  • Model for estimation of total nitrogen content in sandalwood leaves based on nonlinear mixed effects and dummy variables using multispectral images
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-31
    Zhulin Chen, Xuefeng Wang

    Fertilizer overuse is a common phenomenon in global agroforestry production, and this overuse causes ecological destruction. The ability to accurately estimate the nutrient content of plant leaves in real-time would be a wonderful solution to reduce the degree of environmental damage. In recent years, remote sensing technology has been widely used in the diagnosis of crop nutrition in many countries. Most studies focus on optimal band selection or create new vegetation indices, but these studies have ignored the random impact of natural environmental factors on the estimated results. This paper proposed an estimation model of total nitrogen content (TNC) in sandalwood leaves that takes sampling season and site conditions as the dummy variable and random effect, respectively. Three forestry farms with different locations and site conditions were selected as study areas to enhance the universality of this model. Multispectral images of leaves were obtained using a low-cost five-band camera (RedEdge3, MicaSense, USA), and the experimental results indicate the following: (1) the growth of the tree height, crown width and stem effectively increased under the medium gradient level (N2), whereas a high gradient level (N3) significantly promoted all aspects except tree height; (2) the mean and variance of some image texture features of the G, RE and NIR band were significantly correlated with TNC at the 0.05 and 0.01 levels, and the texture mean value index (TMVI) proposed in this paper can improve the correlation with TNC; and (3) the results obtained using the nonlinear mixed-effects model with dummy variables improved the fitting degree and estimation accuracy compared with results of SVR and BPNN. This study demonstrates the advantages of using the nonlinear mixed-effects model with dummy variables to obtain a more reliable estimation model for the nutritional diagnosis of rare tree species.

    更新日期:2019-11-01
  • The use and misuse of p values and related concepts
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-29
    Richard G. Brereton

    The paper describes historic origins of p values via the work of Fisher, and the competing approach by Neyman and Pearson. Concepts of type 1 and type 2 errors, false positive rates, power, and prevalence are also defined, and the merger of the two approaches via the Null Hypothesis Significance Test. The relationship between p values and false detection rate is discussed. The reproducibility of p values is described. The current controversy over the use of p values and significance tests is introduced.

    更新日期:2019-10-29
  • Weighted sparse principal component analysis
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-28
    Katrijn Van Deun, Lieven Thorrez, Margherita Coccia, Dicle Hasdemir, Johan A. Westerhuis, Age K. Smilde, Iven Van Mechelen

    Sparse principal component analysis (SPCA) has been shown to be a fruitful method for the analysis of high-dimensional data. So far, however, no method has been proposed that allows to assign elementwise weights to the matrix of residuals, although this may have several useful applications. We propose a novel SPCA method that includes the flexibility to weight at the level of the elements of the data matrix. The superior performance of the weighted SPCA approach compared to unweighted SPCA is shown for data simulated according to the prevailing multiplicative-additive error model. In addition, applying weighted SPCA to genomewide transcription rates obtained soon after vaccination, resulted in a biologically meaningful selection of variables with components that are associated to the measured vaccine efficacy. The MATLAB implementation of the weighted sparse PCA method is freely available from https://github.com/katrijnvandeun/WSPCA.

    更新日期:2019-10-28
  • Prediction intervals based soft sensor development using fuzzy information granulation and an improved recurrent ELM
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-25
    Yuan Xu, Han Jiang, Wei Zhang, Abbas Rajabifard, Nengcheng Chen, Yiqun Chen, Yanlin He, Qunxiong Zhu

    With the increasing complexity of large-scale industrial production processes, the number of variable factors is increasing. As a result, it is demanding to predict process key variables accurately. Currently, most of soft sensor models using support vector regression and artificial neural networks are based on point prediction. The soft measurement models using the technique of point prediction can only track or fit set values. It is difficult to deal with the problem of system uncertainty and to make reliability analysis using the point prediction based soft sensors. To address this problem, this paper proposes a development method of soft sensor using the technique of prediction intervals. Under this condition, the prediction intervals instead of the point prediction of the stable operation of the industrial process system are used. The interval boundaries of the trend change can be utilized to quantify and estimate the associated uncertainty. The proposed prediction intervals based soft sensor is based on fuzzy information granularity and improved recurrent extreme learning machine. First, the fuzzy information granularity is adopted to get the lower bound, trend and upper bound of the interval. Secondly, an improved recurrent extreme learning machine is built to further enhance the ability of prediction intervals. In the improved extreme learning machine model, a feedback layer is adopted to store the hidden layer output, calculate the data trend change and dynamically update the outputs of the feedback layer. Third, the comprehensive interval evaluation function is used to evaluate the rationality of the interval results. Through case studies using a University of California Irvine dataset and the purified Terephthalic acid solvent system, the provided prediction intervals method can directly generate the upper and lower bounds for process key variables with high accuracy.

    更新日期:2019-10-25
  • Simultaneous quantitative analysis of four metal elements in oily sludge by laser induced breakdown spectroscopy coupled with wavelet transform-random forest (WT-RF)
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-20
    Tian Wang, Long Jiao, Chunhua Yan, Yao He, Maogang Li, Tianlong Zhang, Hua Li

    Determination of toxic metal elements in oily sludge is meaningful to treatment, migration, improvement, monitoring, and repair of oily sludge, and an accurate and rapid analytical technology is urgent necessary to quantitative detect the toxic metal elements in oily sludge. In this study, a novel method based on laser-induced breakdown spectroscopy (LIBS) technique coupled with wavelet transform-random forest (WT-RF) was proposed to perform quantitative analysis of four toxic metal elements (Cu, Zn, Cr and Ni) in 16 oily sludge samples. In order to facilitate LIBS measurement, the 16 initial oily sludge samples with a water-oil mixed state were subjected to a drying treatment at 150 °C for 5 h, and then ground and passed through a 100 mesh to sift. The 16 oily sludge samples were sliced and collected LIBS spectra, and 11 samples were selected as calibration sets, and rest samples were set as test sets. The raw spectra were first preprocessed by wavelet transform (WT) method, and then the input variables for RF calibration model were selected and optimized based on variable importance. Finally, the WT-RF model with the optimal input variables was constructed to quantitative analysis four toxic metal elements concentration in the oily sludge. The predictive performance of WT-RF model was compared with the RF, partial least squares (PLS) and WT-PLS models. The results indicates that WT-RF model shows a better predictive ability than the other three models for prediction of potential toxic metal concentration in oily sludge, and the best determination coefficient (R2) value of four elements (Cu, Zn, Cr and Ni) were 0.9756, 0.9758, 0.9772, 0.9768, the root mean square error (RMSE) were 0.0358%, 0.0365%, 0.0446% and 0.0344%, and the relative standard deviation (RSD) were 0.0908, 0.0929, 0.0797 and 0.0628. Therefore, LIBS technique combined with WT-RF method is a promising method for the rapid prediction of the toxic metal elements in oily sludge.

    更新日期:2019-10-25
  • Artificial intelligence facilitates drug design in the big data era
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-18
    Liangliang Wang, Junjie Ding, Li Pan, Dongsheng Cao, Hui Jiang, Xiaoqin Ding
    更新日期:2019-10-25
  • New tools for the design and manufacturing of new products based on Latent Variable Model Inversion
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-17
    Daniel Palací-López, Pierantonio Facco, Massimiliano Barolo, Alberto Ferrer

    Latent Variable Regression Model (LVRM) inversion can be an efficient tool to find the so-called Design Space (DS), i.e. the different combinations of inputs (e.g. process conditions, raw materials properties …) that lead to the desired outputs (e.g. product quality, benefits …). This is especially critical when first-principles models cannot be resorted to, running experimental designs is unfeasible and only data from daily production (i.e. historical data) are available. Since data-driven methods are not free of uncertainty, different approaches have been proposed in the literature to delimit a subspace that is expected to contain the DS of a product. However, some of these methods are computationally costly or depend on the existence of at least one combination of inputs that provides, according to the model, the desired values for all output variables simultaneously. Furthermore, no approach to date offers an analytical expression for the confidence region limits for this subspace. In this paper a new way to find the DS is proposed, so the above limitations are overcome. To this end, the analytical definition of the estimation of the DS, and its confidence region limits, as well as a way to transfer restrictions on the original space to the latent space are suggested. An extension of these methods to quality attributes defined as linear combinations of outputs is also provided. The proposed methodology is illustrated using three simulated case studies.

    更新日期:2019-10-25
  • Likelihood Maximization Inverse Regression: A novel non-linear multivariate model
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-26
    Francis B. Lavoie, Alyssa Langlet, Koji Muteki, Ryan Gosselin

    Common multivariate regression models are calculated with the objective of directly predicting calibration y data from X observations. Our proposed methodology, presented in this paper, inverses the problem. Indeed, we propose a regression model which relies on predicting y by the likelihood maximization of expected errors in X. We named our parameter-free algorithm Likelihood Maximization Inverse Regression (LMIR). Using 4 different datasets, we compared LMIR performance with Partial Least Squares-1 (PLS1), a non-linear PLS variant and another inverse regression method: Sliced Inverse Regression (SIR). LMIR yielded better validation performances in almost all study cases. We also demonstrated that LMIR was able to consider any known and additional noise present in validation X observations without creating a new model, as required in PLS1 and SIR. A LMIR model built from one instrument could then be easily transferred to another.

    更新日期:2019-10-25
  • Molecular image-based convolutional neural network for the prediction of ADMET properties
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-21
    Tingting Shi, Yingwu Yang, Shuheng Huang, Linxin Chen, Zuyin Kuang, Yu Heng, Hu Mei

    Convolutional neural network (CNN), is one of the most representative architectures in deep learning and is widely adopted in many fields especially in image classification and object detection. In the last few years, CNN has been aroused more and more attentions in drug discovery domain. In this work, molecular 2-D image-based CNN method was used to establish prediction models of the ADMET properties, including CYP1A2 inhibitory potency, P-glycoprotein (P-gp) inhibitory activity, Blood-Brain Barrier (BBB) penetrating activity, and Ames mutagenicity. The results showed that the predictive power of the established CNN models is comparable to that of the available machine learning models based on manual structural description and feature selection. It can be inferred that CNN can extract efficiently the key image features related to the molecular ADMET properties and offer a useful tool for virtual screening and drug design researches.

    更新日期:2019-10-25
  • A user-friendly excel spreadsheet for dealing with spectroscopic and chromatographic data
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-25
    Alan Lima Vieira, Maurílio Gustavo Nespeca, Weslei Diego Pavini, Edilene Cristina Ferreira, José Anchieta Gomes Neto

    A user-friendly interface was developed in Excel (Microsoft Office) to deal with spectroscopic and chromatographic data. The Excel spreadsheet includes baseline correction, area and height determination, and identification of variables that represent the maximum height. In addition, the user can quickly evaluate standardization methods and create databases by adding information to the library, where up to 300 peaks can be saved. In the present work, the application of the Excel spreadsheet is exemplified by analytical curves developed with gas chromatography, Raman spectroscopy and laser-induced breakdown spectroscopy (LIBS) data. The analytical parameters obtained by the Excel spreadsheet were compared with the results generated in different software, such as MATLAB (routine), Origin and ChromQuest. The coefficient of determination (R2) and root-mean-square error (RMSE) obtained from Excel spreadsheet were very similar to those from other software. The analytical curves were also submitted to analysis of variance (ANOVA) and, regarding the F-test, all presented values of Fcalculated greater than Fcritical. The relationship between the analytical responses from the Excel spreadsheet and other software showed a coefficient of correlation close to 1. Therefore, the proposed Excel interface is an attractive alternative to expensive software and provides easy handling for LIBS data. The spreadsheet and a short tutorial video are available in the electronic supplementary material (or in the link https://1drv.ms/f/s!AjTI1LyRsk35lYtYOvj6EGyXvFgWlQ).

    更新日期:2019-10-25
  • Unsupervised multiblock data analysis: A unified approach and extensions
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-26
    Essomanda Tchandao Mangamana, Véronique Cariou, Evelyne Vigneau, Romain Lucas Glèlè Kakaï, El Mostafa Qannari

    For the analysis of multiblock data, a unified approach of several strategies such as Generalized Canonical Correlation Analysis (GCCA), Multiblock Principal Components Analysis (MB-PCA), Hierarchical Principal Components Analysis (H-PCA) and ComDim is outlined. These methods are based on the determination of global and block components. The unified approach postulates, on the one hand, two link functions that relate the block components to their associated global components and, on the other hand, two summing up expressions to compute the global components from their associated block components. Not only several well-known methods are retrieved but we also introduce a variant of GCCA. More generally, we hint to other possibilities of extensions thus emphasizing the fact that the unified approach, besides being simple, is versatile. We also show how this approach of analysis although basically unsupervised could be adapted to yield a supervised method to be used for a prediction purpose. Illustrations on the basis of simulated and real case studies are discussed.

    更新日期:2019-10-25
  • 更新日期:2019-10-25
  • A MATLAB toolbox for data pre-processing and multivariate statistical process control
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-04
    Gang Yi, Craig Herdsman, Julian Morris

    A Multivariate Statistical Data Pre-screening/Data Pre-processing Toolbox (Pre-Screen) has been designed and developed for use by practising process engineers and researchers who wish to pre-process process data prior to multivariate data analysis, process data modelling or building predictive and inferential models. Many commercial data analysis packages do not fully address the initial data cleaning and data conditioning tasks which can consume up to 80% of the modelling time. The software toolkit has been developed specifically with the aim of focusing on the industrial needs for the initial data pre-screening of large industrial data sets. The core feature of Pre-Screen is that it has been specifically developed to make the analysis of large data sets as fast and visual as possible, and accessible for both process and control engineers, analytical scientists and academic R&D without taking away the need for engineering science understanding. The toolbox builds on top of the MATLAB numerical computing environment, with powerful user interface procedures providing user friendly, mouse/menu driven software. The toolbox has been complied to allow use by those whom do not have access to MATLAB.

    更新日期:2019-10-25
  • Supervised projection pursuit – A dimensionality reduction technique optimized for probabilistic classification
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-09
    Andrei Barcaru

    An important step in multivariate analysis is the dimensionality reduction, which allows for a better classification and easier visualization of the class structures in the data. Techniques like PCA, PLS-DA and LDA are most often used to explore the patterns in the data and to reduce the dimensions. Yet the data does not always reveal properly the structures wen these techniques are applied. To this end, a supervised projection pursuit (SuPP) is proposed in this article, based on Jensen-Shannon divergence. The combination of this metric with powerful Monte Carlo based optimization algorithm, yielded a versatile dimensionality reduction technique capable of working with highly dimensional data and missing observations. Combined with Naïve Bayes (NB) classifier, SuPP proved to be a powerful preprocessing tool for classification. Namely, on the Iris data set, the prediction accuracy of SuPP-NB is significantly higher than the prediction accuracy of PCA-NB, (p-value ≤ 4.02E-05 in a 2D latent space, p-value ≤ 3.00E-03 in a 3D latent space) and significantly higher than the prediction accuracy of PLS-DA (p-value ≤ 1.17E-05 in a 2D latent space and p-value ≤ 3.08E-03 in a 3D latent space). The significantly higher accuracy for this particular data set is a strong evidence of a better class separation in the latent spaces obtained with SuPP.

    更新日期:2019-10-25
  • A sequence-based approach for identifying recombination spots in Saccharomyces cerevisiae by using hyper-parameter optimization in FastText and support vector machine
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-09-25
    Duyen Thi Do, Nguyen Quoc Khanh Le

    Meiotic recombination is a biological process which plays a crucial role in genetic evolution. Therefore, the ability of machine learning models in extracting desire information embedded in DNA sequences has drawn a great deal of attention among biologists. Recently, several attempts have been made to address this problem, however, the performance results still need to be improved. The current study aims to investigate the relationship between natural language processing model and supervised learning in classifying DNA sequences. The idea is to treat DNA sequences by FastText model, including sub-word information and then use them as features in a suitable supervised learning algorithm. To the end, this hybrid approach helps us classify DNA recombination spots with achieved sensitivity of 90%, specificity of 94.76%, accuracy of 92.6%, and MCC of 0.851. These results have suggested that our newly proposed method is superior to other methods on the same benchmark dataset. This study, therefore, could shed the light on developing the prediction models for recombination spots in particular, and DNA sequences in general.

    更新日期:2019-10-25
  • Chemometrics-assisted calibration transfer strategy for determination of three agrochemicals in environmental samples: Solving signal variation and maintaining second-order advantage
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-14
    Xiao-Dong Sun, Hai-Long Wu, Yue Chen, Jun-Chen Chen, Ru-Qin Yu

    This manuscript proposes a chemometrics-assisted calibration transfer strategy to determine three agrochemicals (indole-3-acetic acid, 1-naphthylacetic acid and thiabendazole) in environmental samples by excitation-emission matrix (EEM) fluorescence detection. The fluorescence landscapes of target compounds are heavily overlapped with each other and also with the extracts of real samples, making it impractical to be quantified by conventional spectroscopy methods. The second-order calibration methods based on “mathematical separation” are carried out to handle these headaches as its potential capacity to get successful resolution and quantification even in the presence of overlapped peaks and unknown interferences, known as the famous “second-order advantage”. Piecewise direct standardization (PDS) method, one calibration transfer strategy, is applied for updating chemometric multivariate calibration model to compensate for the signal instability and variation in responses recorded on different instruments, which avoids overhead of a full recalibration usually involves considerable effort, cost, and time. Both root-mean-square error of prediction (RMSEP) and average recoveries calculated from simulated and experimental data are analyzed to validate the feasibility and applicability of this strategy. Result reveals that the number of calibration standard used was reduced from 10 to 3 for later determinations while producing comparable analytical results in terms of RMSEPs and recovery values to those obtained by a full recalibration strategy. The developed strategy takes advantage of less experimental effort and cost, which can be an alternative one for fast screening of agrochemicals and long-term process analysis in environmental samples or related food matrices.

    更新日期:2019-10-25
  • Using polarized Total Synchronous Fluorescence Spectroscopy (pTSFS) with PARAFAC analysis for characterizing intrinsic protein emission
    Chemometr. Intell. Lab. Systems (IF 2.786) Pub Date : 2019-10-15
    Marina Steiner-Browne, Saioa Elcoroaristizabal, Alan G. Ryder

    Using polarized Excitation Emission Matrix (pEEM) spectroscopy to measure the intrinsic emission of proteins offers a potentially useful methodology for a wide variety of potential applications. However, the presence of Rayleigh light scatter causes significant problems when attempting to use Parallel Factor (PARAFAC) and for anisotropy calculations. The use of polarized Total Synchronous Fluorescence Spectroscopy (pTSFS) can minimize Rayleigh scatter and avoid the use of complex data correction methods. Here, we investigated for the first time the use of pTSFS and PARAFAC to analyze the intrinsic emission of an Immunoglobulin (IgG) type protein in its native state. To enable PARAFAC analysis however, TSFS data (which is not trilinear) must first be transformed into an EEM like layout (t-EEM) and this generated a region with no experimentally acquired information (<8%). Here we critically evaluated several data handling methods and determined that interpolation was the best solution for dealing with the spectral regions with no experimentally acquired data at the blue edge of the emission. There were only subtle structural changes measured over the temperature range (15–35 °C) analyzed and PARAFAC only resolved two emitting components. A Trp emission component (average signal from all Trp present) which represented >92% of the explained variance, and a much weaker, mostly Tyr related emission with ~3% of the explained variance. The recovery of this Tyr component was only possible because pTSFS measurements were less contaminated by Rayleigh scattering. Changes in Tyr-to-Trp energy transfer rates caused by thermal motion were detected as an increase in Tyr contribution, which could not be resolved with the equivalent pEEM measurements due to light scatter contamination. The increased selectivity, sensitivity, and reproducibility of pTSFS measurements shows that this is a better option than pEEM for fluorescence emission based monitoring of protein structural change or lot-to-lot variance of IgG type proteins.

    更新日期:2019-10-25
Contents have been reproduced by permission of the publishers.
导出
全部期刊列表>>
2020新春特辑
限时免费阅读临床医学内容
ACS材料视界
科学报告最新纳米科学与技术研究
清华大学化学系段昊泓
自然科研论文编辑服务
加州大学洛杉矶分校
上海纽约大学William Glover
南开大学化学院周其林
课题组网站
X-MOL
北京大学分子工程苏南研究院
华东师范大学分子机器及功能材料
中山大学化学工程与技术学院
试剂库存
天合科研
down
wechat
bug