利用二级质谱自动进行聚糖结构解析的从头开始算法

Denovo Algorithm for Automated Glycan Structure Assignment by MS/MS

  • 摘要: 关于不借助数据库,根据质谱自动地从头开始解析聚糖结构(包括单糖组成、排列信息和单糖之间的连接信息)已有多年研究,然而,如何快速准确地得到结果仍然面临诸多挑战。为了降低时间复杂度,现有的方法要么采用贪心法或者启发式算法,这些算法本身就是不精确的,难以保证得到结果的准确性;要么采用剪枝法或者动态规划之类的精确算法,但是这类算法不仅时间复杂度较高,而且其中大量使用的假设和理想化模型忽视了许多对结果有影响的实验细节。诸如打分函数中对不同候选结构重复使用相同谱峰进行评分的问题,先前的精确算法常常选择回避和无视,这些被忽视的细节最终导致结果的不准确。本工作提出了基于迭代增长的方法“自底向上”地利用谱图解析聚糖结构的算法。与以往迭代方法不同,该算法中增长的单位不再是单糖,而是在算法中产生的子结构,这使得算法的运行速度大大加快。在将各种实验细节纳入算法流程的基础上,通过对20种聚糖的二级质谱图解析以及与先前算法的比较,证实了该算法具有较高的准确性(75%聚糖的正确结构被算法解析为第一)。

     

    Abstract: Determining denovo glycan structure automatically from MS/MS (including monosaccharide composition, sequencing topology and linkage between adjacent monosaccharide) has been studied for many years, but interpreting glycan structure from MS quickly and accurately is still a great challenge. Existing methods can be generally divided into two classes: greedy, heuristic to reduce time complexity, which are inexact by their nature; or exact methods such as dynamic programming or exhaustive method, which are slower than inexact methods and share common problems such as repetitive peak counting and crude scoring function in reconstructing candidate structure procedure. These unheeded details will lead to inaccuracy results. In this paper, a denovo algorithm we designed to accurately reconstruct the tree structure bottomed up from MS/MS with only some logical constrains, which can be applied to N-glycan or O-glycan equally. Different from previous iterative methods, the growing unit in this algorithm is not monosaccharide but substructure produced in the iterative procedure, thus improving the processing speed significantly. By taking unheeded details into consideration, experiments were conducted on 20 complex glycan structures extracted from human sperm, the results show that this algorithm has a high accuracy by ranking 15 real structure the first place.

     

/

返回文章
返回