![]() |
Image Credit |
Here we show how "pointerizing" code in C can result in improved performance on some x86_64 platforms. In days of old this was a popular way to speed up C code, but for a considerable period of time, computer architectures and compilers were such that it was of little or no benefit. As an optimization technique it went "out of style." In a modest way, it looks like it's working again on some modern x86_64 machines.In the earliest times, pointers in C had a reputation for being a great way to speed up your code. Early compilers generated code such that on early machines, in many cases, pointer code ran much more quickly than indexed code. If you look at the first edition of "The C Programming Language" by Kernighan and Ritchie (K&R) that came out in 1978, on page 93 you'll find they say:
"In C there is a strong relationship between pointers and arrays, strong enough that pointers and arrays should really be treated simultaneously. Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster but, at least to the uninitiated, somewhat harder to grasp immediately."The earliest benchmark I could find where pointer code was directly compared with indexed code was in the book "Writing efficient programs" by Jon Bentley which came out in 1982. On page 95 we see tests of some insertion sorts compiled with optimizations where the pointer version ran more than twice as fast as the indexed version (fragment M6 vs. M5). These tests were done by John Majernik on a PDP 11/70, a 16-bit computer from 1975.
But as early as 1997 in a series of benchmarks in the October 1997 issue of Unix Review, Jon Bentley found that in the cases he was testing on three different machines from that time, pointer code was rarely if ever faster than the indexed code. In six cases on three different machines where he looked at optimized code for routines that accessed arrays and then had their inner loops converted to pointer code, only one case showed any speedup, one case ran the same, and four cases actually ran slower with pointers (Figure 1 is2sp vs. is2 and is4p vs. is4). He suggested:
"The displacement addressing modes of modern architectures make array indexing so fast that programmers (usually) need no longer bother with "pointerizing" array accesses."This attitude has persisted. As recently as 2018 in "Computer Organization and Design RISC-V edition" by Patterson and Hennessy, beginning on page 141 we find a discussion of arrays and pointers. They start out by saying:
"The purpose of this section is to show how pointers map into RISC-V instructions, and not to endorse a dated programming style."On page 144 we get a statement of a commonly held opinion:
"People used to be taught to use pointers in C to get greater efficiency than that available with arrays: 'Use pointers, even if you can’t understand the code.' Modern optimizing compilers can produce code for the array version that is just as good. Most programmers today prefer that the compiler do the heavy lifting."To find out how fast pointer code is compared to indexed code on some x86_64 processors, we used a method similar to what was used in the past. We compared the performance of an indexed insertion sort routine:
with a pointer version of the same:
We tested these routines on linux using the gcc compiler with -O3 optimization. We used version 7.3.0 of the 64-bit compiler for these tests. We sorted arrays of 100,000 random integers. The percentages shown are the speed improvement of the pointer routine over the indexed one.
Processor Clock Indexed Pointer Gain
Intel Core i5-4690 3.50 GHz 1127 ms 901 ms 25.2 %
Intel Pentium J2900 2.41 GHz 4170 ms 3685 ms 13.2 %
Intel Core 2 Duo U9600 1.60 GHz 3884 ms 3169 ms 22.6 %
AMD E-350 1.60 GHz 6399 ms 5636 ms 13.5 %
AMD Athlon 64 3500+ 2.20 GHz 3953 ms 3938 ms 0.4 %
Most of the processors are showing significant performance improvements with the pointer code. The Athlon processor is showing a very small and possibly insignificant improvement.
After seeing these results, one might wonder if there is anything the compiler folks can do to improve indexed memory accesses. In at least one of the machines I tested, this is exactly what has happened. Here are some results using gcc version 8.2.0. As a matter of convenience, the Core 2 duo was run on macOS instead of linux using Homebrew gcc 8.2.0.
Processor Clock Indexed Pointer Gain
Intel Core i5-4690 3.50 GHz 969 ms 903 ms 7.3 %
Intel Pentium J2900 2.41 GHz 4277 ms 3689 ms 16.0 %
Intel Core 2 Duo U9600 1.60 GHz 3869 ms 3206 ms 20.7 %
AMD E-350 1.60 GHz 6634 ms 6260 ms 6.0 %
AMD Athlon 64 3500+ 2.20 GHz 4122 ms 3990 ms 3.3 %
As you can see, indexed accesses on the Core i5 have significantly improved, however performance on other chips has declined. For example, indexed access on the Pentium J2900 is slower and the AMD E-350 is slower all around. So different compilers do different things, but the notion that pointers can be faster than indexed memory access remains.
I sometimes see questions online where people learning C read about pointers in K&R and they ask if it's true that pointers are faster. I see all kinds of answers, but it looks to me like things are pretty much as described by Jon Bentley in "Writing Efficient Programs." He put the pointer optimizations in the chapter on system-dependent efficiency. That seems like a good place for it. What you get depends on the computer. Although some folks consider this technique to be "dated," maybe something old is new again because from time to time one can still find a performance reward for using pointers.
If you want to try the benchmark program, source code may be found on github at (github.com/aequorea/pitest). GCC has many compiler flags one can play around with to change how it optimizes code, for example -march=native. You can see for yourself how they change things. So far I find that the numbers can change but the story doesn't.
References
If you're interested in benchmarking and optimization here are a couple of nice videos: one from Chandler Carruth and one from Andrei Alexandrescu.
The C Programming Language, Kernighan and Ritchie (1978).
Writing Efficient Programs, Jon Bentley (1982).
Working With the System, Jon Bentley, Unix Review, Oct. 1997.
Computer Organization and Design RISC-V edition, Patterson and Hennessy (2018)
No comments:
Post a Comment