A PU learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome

Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes such as cellular communication, ligand recognition, and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated.

We present a novel bioinformatics tool called GlycoMine-PU, which is a Positive Unlabelled learning-based tool for the systematic in silico identification of C-, N- and O-linked glycosylation sites in the human proteome. GlycoMine-PU was developed using the PA2DE(V2.0) algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from UniProt database.

Please input a sequence in the FASTA format (uncommon amino acids including B, J, O, U, X and Z are not acceptable)


The datasets for web server training are available for download here

It is very easy and straightfoward to use the GlycoMine server to make the prediction.

1. Fill in the text area with a protein sequence in the FASTA format and select the corresponding glycosylation type model.

Note: Since GlycoMine-PU runs on a shared server, the resource for large-scale or batch computations is limited. Therefore less than 5 (or equal to 5) sequences each time for submission are allowed. Contact us if you need large-scale computations and we will be happy to help.

2. Please wait patiently for the prediction result to be returned by our server. Each sequence may take 5 or more minutes. The prediction results are shown as follows. The central residue in 'Adjacent residues' is the predicted glycosylation site.

If you have any questions, please do not hesitate to contact us.


If you are interested in our other works in the fields of bioinformatics and systems biology, please refer to the following websites for more information: