当前位置: X-MOL 学术Appl. Spectrosc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Novel Compression Method of Spectral Data Matrix Based on the Low-Rank Approximation and the Fast Fourier Transform of the Singular Vectors
Applied Spectroscopy ( IF 3.5 ) Pub Date : 2021-10-01 , DOI: 10.1177/00037028211044759
Joseph Dubrovkin 1
Affiliation  

Storage, processing, and transfer of huge matrices are becoming challenging tasks in the process analytical technology and scientific research. Matrix compression can solve these problems successfully. We developed a novel compression method of spectral data matrix based on its low-rank approximation and the fast Fourier transform of the singular vectors. This method differs from the known ones in that it does not require restoring the low-rank approximated matrix for further Fourier processing. Therefore, the compression ratio increases. A compromise between the losses of the accuracy of the data matrix restoring and the compression ratio was achieved by selecting the processing parameters. The method was applied to multivariate chemometrics analysis of the cow milk for determining fat and protein content using two data matrices (the file sizes were 5.7 and 12.0 MB) restored from their compressed form. The corresponding compression ratios were about 52 and 114, while the loss of accuracy of the analysis was less than 1% compared with processing of the non-compressed matrix. A huge, simulated matrix, compressed from 400 MB to 1.9 MB, was successfully used for multivariate calibration and segment cross-validation. The data set simulated a large matrix of 10 000 low-noise infrared spectra, measured in the range 4000–400 cm−1 with a resolution of 0.5 cm−1. The corresponding file was compressed from 262.8 MB to 19.8 MB. The discrepancies between original and restored spectra were less than the standard deviation of the noise. The method developed in the article clearly demonstrated its potential for future applications to chemometrics-enhanced spectrometric analysis with limited options of memory size and data transfer rate. The algorithm used the standard routines of Matlab software.



中文翻译:

一种基于奇异向量低秩逼近和快速傅里叶变换的谱数据矩阵压缩新方法

巨大矩阵的存储、处理和转移正成为过程分析技术和科学研究中具有挑战性的任务。矩阵压缩可以成功解决这些问题。我们基于其低秩逼近和奇异向量的快速傅里叶变换,开发了一种新的光谱数据矩阵压缩方法。该方法与已知方法的不同之处在于它不需要恢复低秩近似矩阵以进行进一步的傅里叶处理。因此,压缩比增加。通过选择处理参数实现了数据矩阵恢复精度损失和压缩比之间的折衷。该方法应用于牛奶的多元化学计量学分析,使用从压缩形式恢复的两个数据矩阵(文件大小为 5.7 和 12.0 MB)确定脂肪和蛋白质含量。相应的压缩比约为 52 和 114,而与未压缩矩阵的处理相比,分析的准确性损失小于 1%。一个巨大的模拟矩阵,从 400 MB 压缩到 1.9 MB,成功地用于多变量校准和分段交叉验证。该数据集模拟了一个包含 10 000 个低噪声红外光谱的大型矩阵,测量范围为 4000-400 cm 而与未压缩矩阵的处理相比,分析的准确性损失小于 1%。一个巨大的模拟矩阵,从 400 MB 压缩到 1.9 MB,成功地用于多变量校准和分段交叉验证。该数据集模拟了一个包含 10 000 个低噪声红外光谱的大型矩阵,测量范围为 4000-400 cm 而与未压缩矩阵的处理相比,分析的准确性损失小于 1%。一个巨大的模拟矩阵,从 400 MB 压缩到 1.9 MB,成功地用于多变量校准和分段交叉验证。该数据集模拟了一个包含 10 000 个低噪声红外光谱的大型矩阵,测量范围为 4000-400 cm-1,分辨率为 0.5 cm -1。相应的文件从 262.8 MB 压缩到 19.8 MB。原始光谱和恢复光谱之间的差异小于噪声的标准偏差。文章中开发的方法清楚地证明了它在未来应用于化学计量增强光谱分析的潜力,但内存大小和数据传输率的选择有限。该算法使用 Matlab 软件的标准程序。

更新日期:2021-10-01
down
wechat
bug