Wednesday, February 6, 2019

High performance code in C/C++ using pointers

Image Credit
Here we show how "pointerizing" code in C can result in improved performance on some x86_64 platforms. In days of old this was a popular way to speed up C code, but for a considerable period of time, computer architectures and compilers were such that it was of little or no benefit. As an optimization technique it went "out of style." In a modest way, it looks like it's working again on some modern x86_64 machines.
In the earliest times, pointers in C had a reputation for being a great way to speed up your code. Early compilers generated code such that on early machines, in many cases, pointer code ran much more quickly than indexed code. If you look at the first edition of "The C Programming Language" by Kernighan and Ritchie (K&R) that came out in 1978, on page 93 you'll find they say:
"In C there is a strong relationship between pointers and arrays, strong enough that pointers and arrays should really be treated simultaneously. Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster but, at least to the uninitiated, somewhat harder to grasp immediately."
The earliest benchmark I could find where pointer code was directly compared with indexed code was in the book "Writing efficient programs" by Jon Bentley which came out in 1982. On page 95 we see tests of some insertion sorts compiled with optimizations where the pointer version ran more than twice as fast as the indexed version (fragment M6 vs. M5). These tests were done by John Majernik on a PDP 11/70, a 16-bit computer from 1975.

But as early as 1997 in a series of benchmarks in the October 1997 issue of Unix Review, Jon Bentley found that in the cases he was testing on three different machines from that time, pointer code was rarely if ever faster than the indexed code. In six cases on three different machines where he looked at optimized code for routines that accessed arrays and then had their inner loops converted to pointer code, only one case showed any speedup, one case ran the same, and four cases actually ran slower with pointers (Figure 1 is2sp vs. is2 and is4p vs. is4). He suggested:
"The displacement addressing modes of modern architectures make array indexing so fast that programmers (usually) need no longer bother with "pointerizing" array accesses."
This attitude has persisted. As recently as 2018 in "Computer Organization and Design RISC-V edition" by Patterson and Hennessy, beginning on page 141 we find a discussion of arrays and pointers. They start out by saying:
"The purpose of this section is to show how pointers map into RISC-V instructions, and not to endorse a dated programming style."
On page 144 we get a statement of a commonly held opinion:
"People used to be taught to use pointers in C to get greater efficiency than that available with arrays: 'Use pointers, even if you can’t understand the code.' Modern optimizing compilers can produce code for the array version that is just as good. Most programmers today prefer that the compiler do the heavy lifting."
To find out how fast pointer code is compared to indexed code on some x86_64 processors, we used a method similar to what was used in the past. We compared the performance of an indexed insertion sort routine:


with a pointer version of the same:


We tested these routines on linux using the gcc compiler with -O3 optimization. We used version 7.3.0 of the 64-bit compiler for these tests. We sorted arrays of 100,000 random integers. The percentages shown are the speed improvement of the pointer routine over the indexed one.   

        Processor            Clock    Indexed  Pointer   Gain

    Intel Core i5-4690      3.50 GHz  1127 ms   901 ms  25.2 %
    Intel Pentium J2900     2.41 GHz  4170 ms  3685 ms  13.2 %
    Intel Core 2 Duo U9600  1.60 GHz  3884 ms  3169 ms  22.6 %
    AMD E-350               1.60 GHz  6399 ms  5636 ms  13.5 %
    AMD Athlon 64 3500+     2.20 GHz  3953 ms  3938 ms   0.4 %


Most of the processors are showing significant performance improvements with the pointer code. The Athlon processor is showing a very small and possibly insignificant improvement.

After seeing these results, one might wonder if there is anything the compiler folks can do to improve indexed memory accesses. In at least one of the machines I tested, this is exactly what has happened. Here are some results using gcc version 8.2.0. As a matter of convenience, the Core 2 duo was run on macOS instead of linux using Homebrew gcc 8.2.0.

        Processor            Clock    Indexed  Pointer   Gain

    Intel Core i5-4690      3.50 GHz  969  ms   903 ms   7.3 %
    Intel Pentium J2900     2.41 GHz  4277 ms  3689 ms  16.0 %
    Intel Core 2 Duo U9600  1.60 GHz  3869 ms  3206 ms  20.7 %
    AMD E-350               1.60 GHz  6634 ms  6260 ms   6.0 %
    AMD Athlon 64 3500+     2.20 GHz  4122 ms  3990 ms   3.3 %


As you can see, indexed accesses on the Core i5 have significantly improved, however performance on other chips has declined. For example, indexed access on the Pentium J2900 is slower and the AMD E-350 is slower all around. So different compilers do different things, but the notion that pointers can be faster than indexed memory access remains.

I sometimes see questions online where people learning C read about pointers in K&R and they ask if it's true that pointers are faster. I see all kinds of answers, but it looks to me like things are pretty much as described by Jon Bentley in "Writing Efficient Programs." He put the pointer optimizations in the chapter on system-dependent efficiency. That seems like a good place for it. What you get depends on the computer. Although some folks consider this technique to be "dated," maybe something old is new again because from time to time one can still find a performance reward for using pointers.

If you want to try the benchmark program, source code may be found on github at (github.com/aequorea/pitest). GCC has many compiler flags one can play around with to change how it optimizes code, for example -march=native. You can see for yourself how they change things. So far I find that the numbers can change but the story doesn't.

References

If you're interested in benchmarking and optimization here are a couple of nice videos: one from Chandler Carruth and one from Andrei Alexandrescu.

The C Programming Language, Kernighan and Ritchie (1978).

Writing Efficient Programs, Jon Bentley (1982).

Working With the System, Jon Bentley, Unix Review, Oct. 1997.

Computer Organization and Design RISC-V edition, Patterson and Hennessy (2018)

Wednesday, July 18, 2018

HIV vaccine strategy -- targeting the CD4 binding site

Image Credit
Recently, an HIV vaccine paper describes the vaccination of four cows with an Env based vaccine. It suggests that if you have germline antibodies with long enough third heavy chain complementarity determining regions (HCDR3), things which are shorter in humans, a very simple immunization scheme can result in some very nice broadly neutralizing antibodies. How do we encourage the immune system's processes of somatic hypermutation and affinity maturation to favor inserts in the heavy chain or deletions in the light chain which might also facilitate antibody binding, thereby promoting the creation of broadly neutralizing antibodies?

This strategy is inspired in part by the same piece of information that inspired the glycosylation strategy of our HIV microbicide design. It is different in concept from other strategies being pursued to target the CD4 binding site while utilizing a sequential immunization strategy.

Some of the strongest phenotypic properties of HIV's Env structure linked to sexual transmission are shorter hypervariable domains and fewer potential N-linked glycosylation sites. This deficit in glycosylation sites in pioneer strains of HIV may be allowing better access to the conserved part of the CD4 binding site that the heavy chain of broadly neutralizing antibodies such as N6 bind to. During the course of an infection, the glycan coat becomes more full. This may create stress on a bound antibody that may serve to encourage the insertion or deletion mutations that might lead to the broadly neutralizing antibodies that are sometimes observed.

First we would like the heavy chain to get the best possible grip on the conserved part of the CD4 binding site in the absence of glycans that may be obscuring it. This is to maximize the probability that binding will survive the stress of adding back the glycans later. We utilize the principle of complimentary surfaces to create a heterodimeric immunogen designed to encourage somatic hypermutation to sharply focus and improve binding to the conserved part of the CD4 binding site. This heterodimer will likely require some additional designed-in glycans to prevent off-target immunogenicity. We expect that the heavy chain will be the part of the antibody that will be focused on that binding site in the majority of cases.

Now that the heavy chain has a good grip on things, we add back the glycans gradually so that the heavy chain doesn't lose its grip on the CD4 binding site. If things go well, one or another of the binding loops on the heavy chain will have evolved additional length, or maybe the light chain has elements that will experience deletions and become shorter, such that the antibody will then be able to efficiently bind the CD4 binding site in the presence of all of the glycans.

This is a fairly speculative plan. The insertion and deletion mutations are rare and could take a significant amount of time to observe. Something I find fascinating is the relationship between autoimmunity and the induction of broadly neutralizing antibodies in HIV patients. Elicitation of these kinds of antibodies may be facilitated by a compromised immune system, and it may be difficult to elicit them in a completely healthy person. An effective vaccine regimen may involve artificially inducing this kind of compromise.

So maybe one would like to do the whole thing in vitro, and once you have the B-cells that you want, put them back. Of course if you're going to pull out the B-cells and mess with them, you might as well just engineer the antibodies you want into them. Unfortunately, either way, this sort of defeats the purpose of a low cost solution.

HIV economics

Image Credit
The people who need HIV treatment or help with HIV prevention the most, are the people who can afford it the least. 

These people live in Sub-Saharan Africa. Although significant progress has been made in reducing AIDS related deaths in this region over the years, there is much progress remaining to be made. We are still losing on the order of a million people a year. So if you happen to read a story about HIV in the press that gives the impression that the problem is essentially solved and that people can live long happy lives as long as their condition is properly diagnosed and treated, you might like to consider that this story depends on where you live. There are still many people who do not know if they have HIV, and even if they do, there are places where the majority of people do not have access to treatment. In cases where people do have access to treatment, the rising rate of resistance to treatment is particularly concerning.

Sunday, June 17, 2018

Evolving sharply focused antibodies with complimentary surfaces

Image Credit
There have been some papers describing heterotrimeric Env like immunogens for HIV, which have shown increased breadth of neutralization, but which can in principle generate diverse responses. Here we describe a method based on a heterodimer designed to elicit a single focused response. Immunogens with complimentary surfaces take advantage of the fact that to activate a B-cell, a key event in the immune system's affinity maturation feedback loop where the results of somatic hypermutation are selected, the immunogen is required to be at least a dimer. We arrange for the two surfaces of a dimeric immunogen to have the conserved binding site repeated on both elements of the dimer, and to have the periphery surrounding the conserved binding site on both surfaces be "complimentary." What we mean by this is if a mutation to the B-cell receptors improves the affinity to the conserved binding site, the affinity should be improved on both surfaces and this result is fed back to the system at full strength in the form of improved binding. If a mutation improves the affinity to a region in the periphery of the binding site on one element of the dimer, then if the surfaces are perfectly complimentary, the affinity to the same region on the complimentary surface will be reduced. The feedback signal from these mutations would thus be attenuated and more likely to be ignored. This should tend to focus binding on the conserved binding site with binding to the periphery remaining relatively unimproved.

This is not a perfect idea. In practice, if the heavy chain gets a good grip on the binding site of interest, there is enough flexibility for the light chain to bind in multiple locations. Something that has been done previously to sort broadly neutralizing antibodies is to resurface the periphery of the binding site of interest. One might like to try residues that are less sticky like alanines or amino acids with non-polar sidechains. The problem is, if you get too carried away with this kind of thing, you might end up with a protein that will no longer fold. One could try to arrange for polar or charged residues on the periphery of the binding site to be complimentary in a rough kind of way, but it wouldn't be perfect. It might be better than nothing though. Simple sequence variation in the unconserved regions might also be better than nothing.

Our assumption that a dimer is enough to activate a B-cell is not always true. In that study the ligands were bivalent anti-IgD or bivalent anti-IgM antibodies (m.w. >100 kDa). I don't know why these bivalent antibodies can't always activate a B-cell, but our dimers will likely be made from smaller and much simpler proteins, and when they bind a B-cell receptor, they may be more likely to bind the variable regions which could have an effect on activation. A recent paper is informative. They tested a small molecule hapten NIP (m.w. 323.04 Da) and hen egg lysozyme (m.w. 14.4 kDa) as antigens. In neither case could monomers activate either IgD or IgM B-cell receptors. An NIP dimer could activate an IgM receptor but not IgD. A hen egg lysozyme dimer could activate both. For NIP to activate IgD required a trimer. Hen egg lysozyme is a nice small protein that has a size that is along the lines of the kinds of things we would likely be using.

Monday, November 6, 2017

A glycosylated HIV microbicide based on N6

Pegylated Griffithsin -- an "early model"  (5/10/16) 
Lectins like griffithsin have received some interest as potential HIV microbicides where binding the HIV particle’s glycan coat with lectins might act to prevent infection. Griffithsin binds mannose moieties. All female reproductive tract epithelia express the mucin MUC1, and MUC1 has N-linked glycosylation that results in a glycan coat which may have mannose moieties. So mannose binding lectins like griffithsin may just as easily bind mucus proteins as HIV and be prevented from reaching their target. We might want something that binds HIV more specifically.

One possible design is a glycosylated neutralizer based on the N6 antibody, an antibody that has been shown to effectively neutralize 98% of 181 different HIV isolates that it was tested against. The N6 antibody has the property that it binds the CD4 binding site on HIV's gp120 protein mostly with one chain -- the heavy chain. The light chain has evolved to stay out of the way. This allows the majority of the antibody to be cut away leaving a small molecule with the expectation that this molecule might still bind the CD4 binding site just fine.

Some of the strongest phenotypic properties of HIV's Env structure linked to sexual transmission are shorter hypervariable domains and fewer potential N-linked glycosylation sites. A reduction in glycosylation sites could be supporting viral entry for a variety of reasons, so to preserve the presentation of a dense glycan coat, it might be desirable for a protein based neutralizer, such as the design proposed above to have a dense glycan coat of it's own. This should also improve the solubility of the neutralizer.

Here is an idea of what these molecules might be like.

Gly -- a protein glycosylation designer

Image Credit
This software finds locations on a protein surface such that if we mutate the sites to have the N-linked glycosylation consensus sequence, it has a good chance of being efficiently expressed and glycosylated.

The algorithm implements a simple heuristic. If we want to mutate a protein residue, as long as the mutation doesn't disturb the molecule's hydrophobic core, and the mutation is to a hydrophilic residue, there is a good chance the mutation will be successful. The N-linked glycosylation "sequon" which is the attachment point for a glycan, involves pairs of hydrophilic residues in the mutant. So the algorithm looks for pairs of residues that have solvent exposed sidechains as potential mutation sites. Prolines are known to be "deal killers" for N-linked glycosylation, so prolines or other unknown residues in the neighborhood of a potential glycosylation site causes that mutation candidate to be excluded.

The program takes PDB files as inputs. There is a file with the complete structure, a file with solvent exposed atoms, a file with beta sheet residues and a file with alpha helix residues. I am currently using the pymol plugin findSurfaceResidues to get the solvent exposed atoms and pymol itself to get the secondary structure files.

Since predictions are not 100% accurate in all cases, the best glycosylation sites should be worked out experimentally. Look for the sites that are efficiently expressed and efficiently glycosylated. In other words, when you express a singly glycosylated version of your protein and do a western or SDS page, look for "bright" bands where the majority of the protein is seen to be in the glycosylated form with small amounts of unglycosylated protein. When expressing protein with two or more glycans, look for the combinations that give the maximum amount of glycans. The appearance of unglycosylated protein when expressing multiple glycans is undesireable and should be avoided.

This software has been tested against a number of data sets. See for example:

Erythropoetin -- 83% accurate.
Interferon alpha -- 71% accurate.
YFP -- 100% accurate.

An example with the analysis of a western blot may be found in the glycosylating interferon alpha data set.

The current version of the software may be found on github. (https://github.com/aequorea/gly)