zyenz: Gly -- a protein glycosylation designer

This software finds locations on a protein surface such that if we mutate the sites to have the N-linked glycosylation consensus sequence, it has a good chance of being efficiently expressed and glycosylated.

The algorithm implements a simple heuristic. If we want to mutate a protein residue, as long as the mutation doesn't disturb the molecule's hydrophobic core, and the mutation is to a hydrophilic residue, there is a good chance the mutation will be successful. The N-linked glycosylation "sequon" which is the attachment point for a glycan, involves pairs of hydrophilic residues in the mutant. So the algorithm looks for pairs of residues that have solvent exposed sidechains as potential mutation sites. Prolines are known to be "deal killers" for N-linked glycosylation, so prolines or other unknown residues in the neighborhood of a potential glycosylation site causes that mutation candidate to be excluded.

The program takes PDB files as inputs. There is a file with the complete structure, a file with solvent exposed atoms, a file with beta sheet residues and a file with alpha helix residues. I am currently using the pymol plugin findSurfaceResidues to get the solvent exposed atoms and pymol itself to get the secondary structure files.

Since predictions are not 100% accurate in all cases, the best glycosylation sites should be worked out experimentally. Look for the sites that are efficiently expressed and efficiently glycosylated. In other words, when you express a singly glycosylated version of your protein and do a western or SDS page, look for "bright" bands where the majority of the protein is seen to be in the glycosylated form with small amounts of unglycosylated protein. When expressing protein with two or more glycans, look for the combinations that give the maximum amount of glycans. The appearance of unglycosylated protein when expressing multiple glycans is undesireable and should be avoided.

This software has been tested against a number of data sets. See for example:

Erythropoetin -- 83% accurate.
Interferon alpha -- 71% accurate.
YFP -- 100% accurate.

An example with the analysis of a western blot may be found in the glycosylating interferon alpha data set.

The current version of the software may be found on github. (https://github.com/aequorea/gly)

zyenz

Monday, November 6, 2017

Gly -- a protein glycosylation designer

No comments:

Post a Comment