Saturday, 4 July 2020

Montgomery Multiplication – a three-page paper

In early June, I learned recently that Peter L. Montgomery passed away in February 2020. I felt a loss immediately and couldn’t help but wrote this small essay in LinkedIn as a tribute to him. I revise the original write-up a bit and publish it here.

Although I never met him in person, his three-page paper, “Peter L. Montgomery (1985). "Modular multiplication without trial division". Mathematics of Computation. 44: 519–521”, had a very big impact on my career in cryptography.

After completing a full RSA library using the Classical method in 1988 (see https://cryptoandsecurity.blogspot.com/2018/02/rsa-and-me-30-years-after-my-first.html), I come across Jun Fang who just came back from Paris after his PhD study and he told me the Montgomery Multiplication. I managed to get a photocopy of this paper and studied it with great care.



I didn’t implement Montgomery Multiplication that time because the change of the underlying modular multiplication method would incur a re-development of the RSA library. In addition, my own RSA implementation was very efficient. But, I didn’t forget this method and kept this photocopy. From time to time, I made remarks on it, some about mathematics, some about programming. Amazingly, it’s been with me since.



Although this paper is of three pages, page 3 is actually a short reference list.



In 1996, when I went to bonnie Scotland to take a position as a Senior Cryptographer in Motorola SPS, I was expected to help the IC design team to implement a Montgomery engine. I didn’t write code for implementation, but generated numerical worked examples to show how each step the multi-precision Montgomery Multiplication worked to help the IC design engineers to microcode a hardware multiplication block.

Around 1997, the rise of Elliptic Curve Cryptography (ECC) called the need of an ECC library. I started to write an ECC library and had chance to implement Montgomery Multiplication properly. Combined with Burt Kaliski Jr’s “The Montgomery inverse and its applications, IEEE Transactions on Computers (Volume: 44 , Issue: 8 , Aug 1995 )”, a fully streamlined ECC library was developed. I had chance to verify the correctness of the library but didn’t have chance to benchmark it. Motorola SPS sold its Smartcard business to Atmel and I was later transferred to Motorola / Freescale NCSG.

That time, NCSG supplied Apple for the G4 (7457) processor which had a vector engine AltiVec® (https://en.wikipedia.org/wiki/AltiVec) as part of the PowerPC ISA. I got chance to develop RSA on the mini supercomputer for benchmark purpose. This time, the Montgomery Multiplication was implemented with the AltiVec® vector engine (Solving Sequential Problems in Parallel – An SIMD Solution to RSA Cryptography, Jan 2, 2007, AN3057, Freescale) with the performance of “3.72ms per signing, 1024 bit RSA @1GHz 7457”.

Montgomery Multiplication is fast because it does not need the correction steps as other methods (Classical, Sedlek, and Barret), an important performance feature for long pipeline super-scalar CPU architecture. It is just naturally 10 to 20 percent faster than any other methods.

In addition to Montgomery Multiplication, Montgomery Powering Ladder and Montgomery Elliptic Curves are often referred in the crypto and security community.

Peter L. Montgomery, R.I.P. Your legacy will be with us for a long time.

No comments:

Post a Comment

Post Quantum Cryptography

The consensus appears that quantum computers will be built, and RSA and ECC will be broken by  Shor's algorithm  at some point. Although...