Abstract:
Mucin-type
O-GalNAc glycosylation is a pivotal post-translational modification of proteins, charactized by high glycan structural heterogeneity and multisite modifications. It regulates core biological processes including cellular recognition, immune response, and signal transduction—aberrations in this modification are linked to autoimmune diseases, diabetes, cardiovascular disorders, inflammation, viral infections, neurodegeneration, and cancers. Elucidating disease-specific
O-glycosylation patterns will advance disease detection, decode molecular pathogenesis, and inform the discovery of novel therapeutic targets. The bottom-up strategy is the gold standard for large-scale glycoproteomic analysis, yet comprehensive
O-glycosylation profiling remains challenging due to non-conserved flanking sequences around glycosylation sites, complex glycan structures, and the low abundance of
O-glycopeptides. High-efficiency enrichment techniques are critical for capturing low-abundance
O-glycopeptides from complex matrices, while advanced database search strategies enable accurate interpretation of tandem mass spectrometry (MS/MS) data. This review summarized key progress in
O-glycopeptide enrichment and MS-based database search methods over the past five years. Enrichment methods have seen significant innovations: hybrid materials integrating hydrophilic interaction liquid chromatography (HILIC) with complementary affinity techniques (e.g., immobilized metal ion affinity chromatography (IMAC), boronic acid chemistry) greatly enhance enrichment efficiency. For example, Ti-IMAC materials capture sialylated
O-glycopeptides from 0.1 μL human serum and enable the identification of ~200
O-glycopeptides, while boronic acid-functionalized mesoporous composites analyzed using 1 μL of serum yield 724
N-glycopeptides and 152
O-glycopeptides. Automated high performance liquid chromatography (HPLC) workflows enable simultaneous
N/
O-glycopeptide separation, identifying 181
N- and 17
O-glycopeptides with significant changes in gastric cancer serum.
O-Glycoprotease-based methods (OgpA/IMPa) combined with solid-phase chemoenzymatic approaches allow specific enrichment: MOTAI distinguishes Tn/sTn from other
O-glycopeptides in colon cancer tissues, identifying 32 upregulated Tn/sTn glycoproteins. Bioorthogonal strategies (GalNAz metabolic labeling-click chemistry) such as Click-iG identify 262
O-glycosylation sites in mouse tissues. Database search tools have overcome traditional limitations:
O-Search-Pattern uses Y-ion pattern matching to boost
O-glycopeptide identifications by 15.4%-199.0% compared with other tools; MSFragger-Glyco leverages open search and ion indexing to increase identifications by 4-6-fold and reduce analysis time to minutes; pGlyco3’s glycan-first strategy enables fast, precise intact glycopeptide analysis. Machine learning approaches show promise: CandyCrunch predicts glycan structures from LC-MS/MS data with 90.3% accuracy in seconds; DeepGlyco uses tree-LSTM and graph neural networks to distinguish glycan isomers; GlyPep-Quant integrates random forests and DBSCAN to improve quantitative performance; MarkerPredict identifies cancer biomarkers through disordered protein and signal network features. These tools address bottlenecks such as low-abundance glycopeptide detection and isomer differentiation. Overall, this review provides a systematic overview of
O-glycopeptide analysis methods, guiding technical innovation and deepening understanding of disease mechanisms-ultimately accelerating the translation of glycoproteomic insights into clinical applications.