SUN Yun, CHEN Yi-bing, CHU Mei-juan, JIANG Xue-hui, WANG Yan, GUO Bing-qing. Review of Data Pre-processing Techniques and Machine Learning in PTR-MS[J]. Journal of Chinese Mass Spectrometry Society, 2018, 39(5): 513-523. DOI: 10.7538/zpxb.2017.0181
Citation: SUN Yun, CHEN Yi-bing, CHU Mei-juan, JIANG Xue-hui, WANG Yan, GUO Bing-qing. Review of Data Pre-processing Techniques and Machine Learning in PTR-MS[J]. Journal of Chinese Mass Spectrometry Society, 2018, 39(5): 513-523. DOI: 10.7538/zpxb.2017.0181

Review of Data Pre-processing Techniques and Machine Learning in PTR-MS

  • Proton transfer reaction mass spectrometry (PTR-MS) is an analytical technique developed for the detection of volatile organic compounds (VOCs). It offers many advantages for VOCs analysis,such us ultra-low detection limits, very short response, no sample preparation, real-time analysis, etc. It has been applied in atmospheric chemistry environmental chemistry, food and biomedical. With the expansion of applications of PTR-MS and the increase of sample types, how to analyze the features from complex data and find out the inherent rules have put forward higher requirements on the processing ability of the algorithm. Therefore, this paper discussed the data preprocessing techniques and machine learning methods. Firstly, we summarized the data preprocessing methods with PTR-MS features. The data generated by the instrument cannot be directly used for statistical analysis, otherwise it will bring great error. Therefore, data pre-processing is an essential step. It includes several steps,such as denoising, normalization, and concentration calculation. The purpose of preprocessing is to get data matrix for subsequent analysis. Next, we focused on the use of machine learning methods for data analysis in PTR-MS, and the advantages of this techniques would be demonstrated as well as the drawbacks. The machine learning method can be divided into two parts. Usually unsupervised methods are common choices for initial data analysis. For further analysis and a priori knowledge, a supervised analysis would be a better way. These methods use this knowledge to learn rules and patterns related to classes in the data, and then use these rules and patterns to predict classes in newly acquired data sets. The main goal of all surveillance techniques is to find the relationship between the predictor (VOC) matrix and the response vector. In general, the combination of the unsupervised and supervised methods is a good idea. PTR-MS is a soft ionization technique, however, the presence of a few fragments will still cause great difficulties in spectral analysis, especially for unknown mixtures, which is the main reason why spectral analysis of PTR-MS differs from other mass spectrometry methods. Perhaps, the data fusion of different platform instruments and different samples will be a good way to solve this problem.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return