1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

Montgomery multiplication works out xyR^–1 mod N without division, where R > N and gcd(R, N) = 1 on the condition of xy < RN.

In practice, N is chosen to be odd and R in form of 2ⁿ with R > N.

Montgomery multiplication: montMUL(x, y, N, R) = xyR^–1 mod N

Input: x < N

y < N

N, the modulus, odd

R, R > N and R in the form of 2ⁿ for an n.

Output: xyR^-¹ mod N

Procedure montMUL(x, y, N, R):

1. J = -N^–1 mod R; // can be pre-calculated see Algorithm J.

2. T = x*y ;

3. q = (T mod R)*J mod R;

4. t = (T + q*N)/R;

5. if (t ³ N)

6. return t - N;

7. else

8. return t ;

End_of_procedure

Because R is chosen as R = 2ⁿ, the divisions by R, at Line 3 and Line 4, are simply value extractions.

1.1.2. Numerical example

Given x = 5, y = 10, N = 13, calculate xy mod N with Montgomery arithmetic.

Step 1: We represent all the numbers in hexadecimal as follows:

x = 5;

y = A;

N = D;

Step 2: The R can be chosen R = 0x10. (note: the 10 is 16 in decimal)

Step 3: Calculate constants J and H as follows:

J = -N^–1 mod R = -D^-1 mod 10 = B.

H = R² mod N = 10 mod D = 9.

Step 4: Applying single-precision montMUL(x, y, N, R) as follows:

1. J = B // pre-calculated

2. T = x*y = 5*A = 3C.

3. q = (T mod R)*J mod R = (3C mod 10)*B mod 10 = C*B mod 10 = 84 mod 10 = 4.

4. t = (T + q*N) /R = (3C + 4*D)/10 = 70/10= 7 (< N)

\ xyR^–1 mod N = 7

Step 5: Calculating montMUL(xyR^–1 mod N, H) = xy mod N:

1. J = B; H = 9; // pre-calculated

2. T = 7*9 = 3F;

3. q = F*B mod 10 = A5 mod 10 = 5;

4. t = (3F + 5*D)/10 = 80/10 = 8;

\ xy mod N = 8.

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

In the multi-precision case of Montgomery multiplication, it is convenient to choose R = (2^w)ⁿ, where w is the number of bits of a CPU word and the n is the number of CPU words that the R occupies.

The following procedure works out xyR^-¹ mod N in multi-precision case for an odd N.

Multi-precision Montgomery multiplication: montMUL(x, y, N, R) = xyR^–1 mod N

Input: x[s-1, … , 0]

y[s-1, … , 0]

N[s-1, … , 0]

Output:

t[2s-1, … , s] = xyR^-¹ mod N

Local variables:

t[2s, 2s-1, … , s , s – 1, … , 0] // 2s + 1 words

c, d, and q, 1-word registers

Procedure: montMUL(x, y, N, R)

1. J = -N[0]^–1 mod 2^w // can be pre-calculated by Algorithm J. Note: here J is only one CPU word */

2. t[2s] = 0; t[2s – 1, … , 0] = x*y; // 2s + 1 words

3. for i = 0 to s –1 // from right to left

4. {

5. c = 0; // c is the carry

6. q = t[i]*J mod 2^w; // take the lower half

7. for j = 0 to s – 1

8. {

9. (c,d) = t[i +j ] + q*N[j] + c;

10. t[i + j] = d;

11. }

12. t[2s, … , i + s] = t[2s, … , i + s] + c;

13. }

14. if t[2s, 2s–1, … , s] > N // can be optimised

15. then t[2s, 2s–1, … , s] = t[2s, 2s–1, … , s] – N;

16. return t[2s–1, … , s]; // note the difference

End_of_procedure

1.2.2. Numerical example

Given x = 93, y = 167, N = 237 in decimals, calculate xy mod N with multi-precision Montgomery multiplication.

Pencil and paper: 93*126 mod 237 = 126

Step 1: We represent all the numbers in hexadecimal as follows:

x = 0x5D;

y = 0xA7;

N = 0xED; // N[1] = 0xE, N[0] = 0xD;

Step 2: The R is chosen R = 0x100. (note: 0x100 is 256 in decimal)

Step 3: Calculate constants J and H as follows:

J = -N[0]^–1 mod R = -D^-1 mod 10 = B;

H = R² mod N = 10000 mod ED = 7C;

Step 4: Applying multi-precision montMUL(x, y) as follows: (s = 2)

1. J = B; // pre-calculated

2. t[4,3,2,1,0] = (0, x*y) = (0, 5D*A7) = 03CAB

3. i = 0

4. { // the i = 0 loop

5. c = 0;

6. q = t[0]*J mod 10 = B*B mod 10 = 79 mod 10 = 9

7. j = 0

8. { the j = 0 loop

9. (c,d) = t[0 + 0] + q*N[0] + 0 = B + 9*D + 0 = 80

//c = 8 and d = 0 at this point;

10. t[0 + 0] = t[0] = d = 0;

11. } // j += 1; Line 8

8. {the j = 1 loop

9. (c,d) = t[0 + 1] + q*N[1] + 8 = A + 9*E + 8 = 90

// c = 9, d = 0

10. t[0 + 1] = t[1] = d = 0;

11. } // j+= 1; j = 2 now and exit of loop j

// t[4,3,2,1,0] = 03C00 now

12. t[4,3,2] = t[4,3,2] + c = 03C + 9 = 045;

// t[4,3,2,1,0] = 04500 now

13. } // i += 1 and go to Line 4

4. { i = 1

5. c = 0;

6. q = t[1]*J mod 10 = 0*J mod 10 = 0;

7. j = 0;

8. { the j = 0 loop

9. (c,d) = t[1] + 0 + 0 = 00;

// c = 0, d = 0;

10. t[1 + 0] = t[1] = d = 0;

11. } // j += 1 and go to Line 8

8. { j = 1 loop

9. (c,d) = t[2] + 0 + 0 = 05;

// c = 0, d = 5;

10. t[1 + 1] = t[2] = d = 5;

11. } // j += 1; j = 2, exits

// t[4,3,2,1,0] = 04500

12. t[4,3] = t[4,3] + c = 04 + 0 = 04;

13. } // i += 1. exit

// t[4,3,2,1,0] = 04500;

14. 045 < ED; // N = 0xED

15. // nothing

16. return t[3,2] = 45;

\5D*A7*R^–1 mod N = 0x45;

Step 5: By applying montMUL() again to achieve

montMUL(45,H) = montMUL(45,7C)= 0x7E. (126 in decimal)

1.3. J, the inverse of –N modulo 2^w

As we can see, an efficient algorithm is needed to work out the J in the above two algorithms. The J can be calculated either by a general Extended Euclidean Algorithm or by the Algorithm J below which takes advantage of R = 2^w.

Algorithm J: J = – N^–1 mod 2^w

Input: N

Output:

J = – N^–1 mod 2^w

Procedure: J = montJ(N)

1. A = 1; // A is a local variable

2. for i = 0 to w – 1 // lsb to msb

3. {

4. if (A is odd) then // the lsb == 0

5. J(bit i) = 1; // the i-th bit of J

6. else

7. J(bit i) = 0;

8. A = (A + J(bit i)*N)/2;

9. }

End_of_procedure

Correctness proof: use exists k, such that JN + 1 = k2^w .

In implementation, the bit i of J is right-shifted into J not by setting bits.

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

We loosely say an integer ring, Z_N or a GF(p), which we normally deal with, as normal domains. For an odd N, as well as p, and an R in the form of 2ⁿ, greater than N, we can make a 1-to-1 mapping as follows:

Given N and R, we have

1. montMUL(x, H) = xR mod N

2. montMUL(xR, 1) = x mod N

3. montMUL(xR, yR) = xyR mod N (Montgomery domain is closed)

4. montEXP(xR, k) = x^kR mod N (Montgomery exponentiation)

To perform the above, we need to calculate:

1. J = –N^–1 mod R

2. H = R² mod N

Sometimes, in Elliptic Curves Cryptography, we may need

U = R³mod N

which can be achieved by U = montMUL(H, H) = R³ mod N and used for

montMUL(x^–1R^–1, U) = x^–1R mod N

to adjust an inverse, achieved by the Extended Euclidean Algorithm, back to the Montgomery domain.

3. Further readings and studies

The following papers are recommended:

1. Peter L. Montgomery, "Modular multiplication without trial division". Mathematics of Computation. 44: 519–521, 1985

2. C. K. Koc, T. Acar and B. Kaliski, “Analyzing and Comparing Montgomery Multiplication Algorithms”, IEEE Micro, 16(3):26-33, June 1996

3. B. Kaliski, “The Montgomery inverse and its applications”, IEEE Transactions on Computers (Volume: 44 , Issue: 8 , Aug 1995

The above #1 is classic, which provides the theory.

The #2 is a good refence for implementation considerations. When a designer is given a CPU architecture, it would be very helpful to study this paper before coding.

The #3 provides an alternative for obtaining inverse in Montgomery domain to the method mentioned in section 2.2. Commonly used arithmetic.

It is recommended to make a worked numerical example of Algorithm J with N = 0xD (1101) and w = 4 by completing the following table:

i	Bit i	*A + J(bit i)N**	A
-	-	-	0001
0	1	01110	0111
1	1	?	?
2	0	?	?
3	1	?	?

J = 1011 = 0xB

Bo Lin

Saturday, 4 July 2020

Montgomery Multiplication - worked numerical examples

1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

1.1.2. Numerical example

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

1.2.2. Numerical example

1.3. J, the inverse of –N modulo 2^w

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

3. Further readings and studies

Post Quantum Cryptography

Report Abuse

Saturday, 4 July 2020

Montgomery Multiplication - worked numerical examples

1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

1.1.2. Numerical example

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

1.2.2. Numerical example

1.3. J, the inverse of –N modulo 2w

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

3. Further readings and studies

Post Quantum Cryptography

1.3. J, the inverse of –N modulo 2^w