跨平台的质谱蛋白回归定量和质量控制的参数方法

Parametric Approach for Regression Quantification and Quality Control of Proteins from Multiple Mass Spectrometry Platforms

  • 摘要: 各质谱厂商通常使用不同的软件进行蛋白鉴定和定量,导致获得的数据和结果的通用性不佳。另外,目前蛋白质定量的准确性仍有提升空间。因此,开发一个标准化、自动化且定量更准确的蛋白质定量流程具有现实意义。采用肽段保留时间对齐和回归技术,可有效地减少中低丰度肽段鉴定信息缺失带来的影响,提高中低丰度蛋白的定量能力。一个综合考虑信号峰宽分布、保留时间分布以及肽段同位素模式分布等因素的肽段筛选器,可以有效地过滤掉不适宜定量的肽段信号,使肽段离子流色谱(XIC)峰定量面积的计算更为准确。该流程由数据开源转换、保留时间对齐与回归定量、肽段筛选器等模块构成,可准确定量不同平台产出的质谱数据,并明显改善低丰度蛋白的无标定量。经对比,该流程对蛋白质组学动态范围标准蛋白集(UPS2)的定量比MaxQuant和Proteome Discoverer的定量更准确。

     

    Abstract: Quantitative analysis based on high-resolution mass spectrometry data, as an increasingly popular high-throughput method in the past decade, is a fundamental work in proteomic research. Different MS manufactures typically use different-suite of software for protein identification and quantification. It is inconvenient to compare results from different MS platforms. Furthermore, the accuracy of protein quantification is improvable. Hence, the development of a standardized and automated protein quantification process is urgent for proteomics researches. A parametric approach was established for quantitative research based on mass spectrum. First, basic information of candidate peptides was extracted from mainstream spectra search program, mainly include m/z, retention times (tR), and corresponding protein IDs, peptides, modifications, charge states, as well as scan numbers. Then the intensities of peptides in first level in tandem mass spectrometry (MS1) were retrieved from the decoded raw file basing on m/z and tR in a proper threshold. The extracted intensity values around the retention time for a certain peptide can be constructed into extracted ion chromatograms (XIC) peak. After smoothed by Savitzky-Golay smoothing filter, the XIC peak area was calculated by trapezoidal rule. For the peptides produced MS1 signals but not MS2 signals due to random sampling in MS, regression was used to improve parallelism of pairwise experiments. Polynomial curve fitting of degree 2 was used to retrieve regression function of retention time by common identified peptides between different runs. With the retention time alignment and regression approach, quantification for low abundance proteins can be improved greatly. A parametric model based on the peak width, retention time and other parameters were combined by joint likelihood ratio and constructed by Bayesian model. This approach can effectively filter out poor peptide for quantification. By evaluated the relationship of m/z and the ratio of nth/1st isotope pattern intensities based on theory isotope pattern models, the function relationships were determined between m/z and the ratio of nth/1st isotope pattern intensities, and the functions for every charge state were built. Then their theory intensities of isotope patterns can be calculated simply. And discarding the poor peptides that out of the theory isotope pattern range based on isotope pattern is an effective way to improve the accuracy of quantification. A comprehensive filter with all these factors can adjust the quantification of XIC. This standardized and automated process contains modules for data format conversion, retention time alignment and regression, and a multi-parametric peptide filter. It works on the majority of mainstream high resolution tandem mass spectrum and derived more precise quantification results, especially for low abundance proteins. A comparison among this process, MaxQuant and Proteome Discoverer proved the accuracy of this parametric method for protein quantification. The method provides a more accurate and intuitive method for MS data utilization, a more convenient method for protein quantification researches.

     

/

返回文章
返回