« Getting reliable data | Main | Is audit logging useful? »

Monday, 19 January 2009

Algorithm Performance in Academic Papers and Practice

A few years ago I had the opportunity to be the peer in a peer-reviewed paper. The subject of the paper was a more efficient way to perform modular inverse (an operation used in DSA and other public key crypto).

But first ... the starting point in computing the modular inverse is the "Extended Euclidian Algorithm". Over the years, researchers have come up with improvements/variations that can find the result much quicker. Peter Montgomery (of Montgomery Method of Modular Multiplication fame) probably started it, but others have come up with algorithmic improvement.

In the paper I reviewed, a team of researchers had come up with an improvement on a technique previously published. Both papers described the performance in terms of O(n), the order of number of operations. They both claimed that the technique (the original and the improvement) would be faster than the existing methods based on this number.

I was skeptical, so I wrote code to test the performance. First, the technique described in the paper worked, it found the mod inverse. But performance was only marginally better than the Extended Euclidian, and much slower than the algorithm I tested against, a form of Montgomery inverse. (All my implementations used the same underlying multi-precision arithmetic code, so any assembly code or other optimizations used by one algorithm were used by both.)

I sent my results to the journal (I recommended they not publish) along with my comments and my code that tested the algorithm. I asked if the researchers wanted to look at my code to see if I had done something wrong. I never heard back from anyone. The journal never asked me to review again (I suspect they did not want to get a review back that recommended do not publish, so they never wanted to take a chance on me again).

Had the researchers ever written code to test their algorithm? Probably not. I suspect that there are many academic papers out there for which an algorithm is proposed, an O(n) is computed, and that's it. No one ever tests the paper by writing an actual program that implements the algorithm. I say this not based only on this "random sample" of 1, but I have heard other stories of people implementing algorithms based on papers that are not as fast as claimed.

Maybe the researchers got the O(n) wrong, but I don't think so. I just think there's a difference between theory and practice.

(At this point what probably jumps into your mind is the quote "In theory there's no difference between theory and practice, but in practice there is," attributed to Jan L.A. van de Snepscheut, Yogi Berra, and even Albert Einstein.)

What can cause the difference? Maybe one algorithm has fewer operations but more of them are multiplies and divides, and the savings of fewer operations is overwhelmed by the expense of divides and mutliplies. Or more likely, an operation (in number of operations) is not a single add or multiply, but a function, and some functions are more expensive. For example, in computing the mod inverse of a 1024-bit number, a multiply operation can actually consist of 1,024 individual multiplies and thousands more individual additions. A divide can similarly consist of 31 divides and thousands of mulitplies and adds.

In one case I heard of, an improvement reduced the number of operations by half, but each operation was 4 times longer. Think of it this way, suppose you had a loop that's 10 cycles. Do 1,000 of them. The cost is 10,000 cycles. Now suppose you have a way to get the same thing done using 500 loops, but each loop is now 40 cycles. That's 20,000 cycles.

At a previous job, we would look at only number of assembly code adds, multiplies, and divides (a subtract is equivalent to an add). That is, an algorithm's O(n) was order of number of assembly code instructions. That is probably the only true way to measure an algorithm's performance.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e55375ef1c8833010536b1e2c5970b

Listed below are links to weblogs that reference Algorithm Performance in Academic Papers and Practice:

Comments

Rainer Gerhards

Steve, nice and inspiring post. It matches my experience when it comes to practical vs. theoretical performance. If have just blogged at http://blog.gerhards.net/2008/10/rsyslog-performance.html about some of my thoughts. -- Rainer

Post a comment

If you have a TypeKey or TypePad account, please Sign In.

Voltage Data Breach Index

  • Grab the Voltage Data Breach Index

February 2012

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29