Thursday, 17 July 2025

Post Quantum Cryptography

The consensus appears that quantum computers will be built, and RSA and ECC will be broken by Shor's algorithm at some point. Although scepticism on quantum computing exists (see Some critical publications on Quantum Computing), the possibility cannot be ignored. As a result, NIST formally issued a "Call for Proposals" for post-quantum cryptographic algorithms on December 20, 2016. and submissions for the first round were then due by November 30, 2017, and 69 complete and proper submissions were announced as first-round candidates on December 21, 2017.

SIKE

Around 2020, hopefuls were clear. Among of them, SIKE (Supersingular Isogeny Key Exchange) looked a very good candidate for its small key size and small ciphertext size. In 2021, I coded SIKE in my spare time. During the coding, I found the SIKE specification's algorithm for computing and evaluating a isogeny was quite complicated. However, if this algorithm coded recursively, the recursion not only made the code simple, but also made the concept of isogeny evaluation easy to understand.

I summarised my work and submitted a paper to PQC'2022 conference (see A Divide-and-Conquer Implementation of SIDH and its Relative Performance), hoping to share the experience and with relative performance figures to justify the viability of SIKE - the point was that, although SIKE was not as fast as other PQC candidates, its performance was comparable with the current standardised Diffie-Hellman system. Unfortunately, the paper was rejected with the reason of lack of innovation and the implementation's performance in terms of absolute performance. (BTW, I was a performance benchmark expert when I worked in Freescale (formerly Motorola Semiconductors) to benchmark their high-performance PowerPC products for EEMBC.)

I read through the feedback from the reviewers and realised the discrepancy. I planned to update this paper for other conferences. Considering some same reviewers could review the updated paper again, I prepared my comments as Comments on PQC2022 Rejection to explain the purpose of the paper. However, SIKE was broken just before NIST's Round 4 Submissions (see NIST Post-Quantum Cryptography Candidate Cracked). The effort for further work on SIKE stopped.

Unexpectedly, SIKE was broken by a method existed since 1997. This shows the the rarity of deep understanding on isogeny and the lack of experts in this difficult mathematics area. On a positive side, this effort has further extended my expertise in modern algebra - I resorted to the last page of my textbook on modern algebra.

FIPS 204 (ML-DSA) / CRYSTAL-Dilithium

After CRYSTAL-Kyber and CRYSTAL-Dilithium were standardised as FIPS 203 (ML-KEM) and FIPS 204 (ML-DSA) respectively, the focus was turned to lattice-based post quantum cryptography. FIPS 204 (ML-DSA) was coded to have a better understanding the FIPS 204 specification and to have a platform for further optimisation.

It is a full C Language implementation coded in spare time applied Montgomery arithmetic to NTT innovatively (see Applying Montgomery Multiplication to Implementing FIPS204 (ML-DSA) and here). A worked example, An FIPS204_ML-DSA-44 Worked Example (and here) is available for those who would like to cross check their implementations.


Sunday, 14 January 2024

Some critical publications on Quantum Computing

There are many news and publications on the advance of quantum computers, but there are some critical viewpoints.

It's very difficult to judge the impact of these criticisms for those who are not familiar with quantum mechanisms. I made a list below for reference. I'll give a brief description on those publications.

Somehow, I feel that for those criticisms, not many quantum computing supporters come out to debate with these sceptics on specific points. For example, to discuss Professor Gerard 't Hooft's vievpoint.

It would be very nice that we can hear more opinions on both sides.

I'll keep updating this page if I see something interesting,





Sunday, 8 November 2020

A surprise rediscovery of a manuscript on the correctness proof of the Barrett algorithm

Life is a chance encounter, full of surprises. 😄

During my first RSA endeavor, I not only missed the Montgomery algorithm, but also missed the Barrett algorithm. What a shame! Later, I got a photocopy of the Montgomery algorithm and used it to implement elliptic curve cryptography, but I didn't have chance to implement the Barrett algorithm in full though I recall I did look into it in detail. Perhaps, the streamlined Montgomery algorithm became the mainstream implementation after the excellent paper by Burt Kaliski Jr. Most people just moved to efficient implementation of Montgomery multiplication.

Anyway, I'd totally forgotten what I did for the Barrett algorithm until a few days ago I incidentally found a photocopy of the paper sandwiched in the pages of my master degree thesis:


It was a photocopy from the Proceedings of Crypto'86 which was published in 1987. I think I got it in 1988 because it is with my old thesis. In page 318, it read as the proof of correctness of the algorithm, which made sure that at most two correcting subtractions were needed, was too complex to be provided in the paper (see that in the red rectangle):

Well, I found a proof by myself that time, written on a piece of dot matrix printer paper, very much yellowed with time having passed by:


With curiosity, I went through the manuscript written in 30 years ago and it is correct. It wasn't written down by chance, but I wasn't able to recall why started with y = [w/b^(n - 1)] * r. No clue at all.





Saturday, 4 July 2020

Montgomery Multiplication - worked numerical examples


1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

Montgomery multiplication works out xyR–1 mod N without division, where R > N and gcd(R, N) = 1 on the condition of xy < RN.
In practice, N is chosen to be odd and R in form of 2n with R > N.
Montgomery multiplication: montMUL(x, y, N, R) = xyR–1 mod N
Input:  x < N
y < N
N, the modulus, odd
            R, R > N  and R in the form of 2n for an n.
Output: xyR-1 mod N
Procedure montMUL(x, y, N, R):
1.    J = -N–1 mod R; // can be pre-calculated see Algorithm J.
2.    T = x*y ;
3.    q = (T mod R)*J mod R;
4.    t = (T + q*N)/R;
5.    if (t ³ N)
6.        return t - N;
7.    else
8.        return t ;
End_of_procedure

Because R is chosen as R = 2n, the divisions by R, at Line 3 and Line 4, are simply value extractions.

1.1.2. Numerical example

Given x = 5, y = 10, N = 13, calculate xy mod N with Montgomery arithmetic.
Step 1: We represent all the numbers in hexadecimal as follows:
x = 5;
y = A;
N = D;
Step 2: The R can be chosen R = 0x10. (note: the 10 is 16 in decimal)
Step 3: Calculate constants J and H as follows:
J = -N–1 mod R = -D-1 mod 10 = B.
H = R2 mod N = 10 mod D = 9.
Step 4: Applying single-precision montMUL(x, y, N, R) as follows:
1.    J = B // pre-calculated
2.    T = x*y = 5*A = 3C.
3.    q = (T mod R)*J mod R = (3C mod 10)*B mod 10 = C*B mod 10 = 84 mod 10 = 4.
4.    t = (T + q*N) /R = (3C + 4*D)/10 = 70/10= 7 (< N)
\ xyR–1 mod N = 7
Step 5: Calculating montMUL(xyR–1 mod N, H) = xy mod N:
1.    J = B; H = 9; // pre-calculated
2.    T = 7*9 = 3F;
3.    q = F*B mod 10 = A5 mod 10 = 5;
4.    t = (3F + 5*D)/10 = 80/10 = 8;
\ xy mod N = 8.

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

In the multi-precision case of Montgomery multiplication, it is convenient to choose R = (2w)n, where w is the number of bits of a CPU word and the n is the number of CPU words that the R occupies.
The following procedure works out xyR-1 mod N in multi-precision case for an odd N.
Multi-precision Montgomery multiplication: montMUL(x, y, N, R) = xyR–1 mod N
Input:   x[s-1, … , 0]
y[s-1, … , 0]
N[s-1, … , 0]
Output:
t[2s-1, … , s] = xyR-1 mod N
Local variables:
t[2s, 2s-1, … , s , s – 1, … , 0] // 2s + 1 words
c, d, and q, 1-word registers

Procedure: montMUL(x, y, N, R)
1.    J = -N[0]–1 mod 2w // can be pre-calculated by Algorithm J. Note: here J is only one CPU word */
2.    t[2s] = 0; t[2s – 1, … , 0] = x*y; // 2s + 1 words
3.    for i = 0 to s –1 // from right to left
4.    {
5.        c = 0;        // c is the carry
6.        q = t[i]*J mod 2w; // take the lower half
7.        for j = 0 to s – 1
8.        {
9.            (c,d) = t[i +j ] + q*N[j] + c;
10.           t[i + j] = d;
11.       }
12.       t[2s, … , i + s] = t[2s, … , i + s] + c;
13.   }
14.   if  t[2s, 2s–1, … , s] > N // can be optimised
15.       then  t[2s, 2s–1, … , s] = t[2s, 2s–1, … , s] – N;
16.   return t[2s–1, … , s]; // note the difference
End_of_procedure

1.2.2. Numerical example

Given x = 93, y = 167, N = 237 in decimals, calculate xy mod N with multi-precision Montgomery multiplication.
Pencil and paper: 93*126 mod 237 = 126
Step 1: We represent all the numbers in hexadecimal as follows:
x = 0x5D;
y = 0xA7;
N = 0xED; // N[1] = 0xE, N[0] = 0xD;
Step 2: The R is chosen R = 0x100. (note: 0x100 is 256 in decimal)
Step 3: Calculate constants J and H as follows:
J = -N[0]–1 mod R = -D-1 mod 10 = B;
H = R2 mod N = 10000 mod ED = 7C;
Step 4: Applying multi-precision montMUL(x, y) as follows: (s = 2)
1.    J = B; // pre-calculated
2.    t[4,3,2,1,0] = (0, x*y) = (0, 5D*A7) = 03CAB
3.    i = 0
4.    { // the i = 0 loop
5.        c = 0;
6.        q = t[0]*J mod 10 = B*B mod 10 = 79 mod 10 = 9
7.        j = 0
8.        { the j = 0 loop
9.            (c,d) = t[0 + 0] + q*N[0] + 0 = B + 9*D + 0 = 80
         //c = 8 and d = 0 at this point;
10.           t[0 + 0] = t[0] = d = 0;
11.       } // j += 1; Line 8
8.        {the j = 1 loop
9.            (c,d) = t[0 + 1] + q*N[1] + 8 = A + 9*E + 8 = 90
        // c = 9, d = 0
10.           t[0 + 1] = t[1] = d = 0;
11.       } // j+= 1; j = 2 now and exit of loop j
    // t[4,3,2,1,0] = 03C00 now
12.       t[4,3,2] = t[4,3,2] + c = 03C + 9 = 045;
    // t[4,3,2,1,0] = 04500 now
13.   } // i += 1 and go to Line 4
4.    {   i = 1
5.        c = 0;
6.        q = t[1]*J mod 10 = 0*J mod 10 = 0;
7.        j = 0;
8.        { the j = 0 loop
9.            (c,d) = t[1] + 0 + 0 = 00;
        // c = 0, d = 0;
10.           t[1 + 0] = t[1] = d = 0;
11.       } // j += 1 and go to Line 8
8.        { j = 1 loop
9.            (c,d) = t[2] + 0 + 0 = 05;
        // c = 0, d = 5;
10.           t[1 + 1] = t[2] = d = 5;
11.       } // j += 1; j = 2, exits
    // t[4,3,2,1,0] = 04500
12.       t[4,3] = t[4,3] + c = 04 + 0 = 04;
13.   } // i += 1. exit
// t[4,3,2,1,0] = 04500;
14.   045 < ED; // N = 0xED
15.   // nothing
16.   return t[3,2] = 45;
\5D*A7*R–1 mod N = 0x45;
Step 5: By applying montMUL() again to achieve
montMUL(45,H) = montMUL(45,7C)= 0x7E. (126 in decimal)

1.3. J, the inverse of N modulo 2w

As we can see, an efficient algorithm is needed to work out the J  in the above two algorithms. The J can be calculated either by a general Extended Euclidean Algorithm or by the Algorithm J below which takes advantage of R = 2w.
Algorithm J: J = – N–1 mod 2w
Input:  N
Output:          
J = – N–1 mod 2w
Procedure: J = montJ(N)
1.    A = 1; // A is a local variable
2.    for i = 0 to w – 1 // lsb to msb
3.    {
4.        if (A is odd) then // the lsb == 0
5.            J(bit i) = 1; // the i-th bit of J
6.        else
7.            J(bit i) = 0;
8.        A = (A + J(bit i)*N)/2;
9.    }
End_of_procedure

Correctness proof: use exists k, such that JN + 1 = k2w .
In implementation, the bit i of J is right-shifted into J not by setting bits.

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

We loosely say an integer ring, ZN or a GF(p), which we normally deal with, as normal domains. For an odd N, as well as p, and an R in the form of 2n, greater than N, we can make a 1-to-1 mapping as follows:


Given N and R, we have

1.     montMUL(x, H) = xR mod N
2.     montMUL(xR, 1) = x mod N
3.     montMUL(xR, yR) = xyR mod N (Montgomery domain is closed)
4.     montEXP(xR, k) = xkR mod N (Montgomery exponentiation)

To perform the above, we need to calculate:

1.     J =  N–1 mod R
2.     H = R2 mod N

Sometimes, in Elliptic Curves Cryptography, we may need

            U = R3 mod N

which can be achieved by U = montMUL(H, H) = R3 mod N and used for

montMUL(x–1R–1, U) = x–1R mod N

to adjust an inverse, achieved by the Extended Euclidean Algorithm, back to the Montgomery domain.

3. Further readings and studies

The following papers are recommended:
1.     Peter L. Montgomery, "Modular multiplication without trial division". Mathematics of Computation. 44: 519–521, 1985
2.     C. K. Koc, T. Acar and B. Kaliski, “Analyzing and Comparing Montgomery Multiplication Algorithms”, IEEE Micro, 16(3):26-33, June 1996
3.     B. Kaliski, “The Montgomery inverse and its applications”, IEEE Transactions on Computers (Volume: 44 , Issue: 8 , Aug 1995

The above #1 is classic, which provides the theory.
The #2 is a good refence for implementation considerations. When a designer is given a CPU architecture, it would be very helpful to study this paper before coding.
The #3 provides an alternative for obtaining inverse in Montgomery domain to the method mentioned in section 2.2. Commonly used arithmetic.

It is recommended to make a worked numerical example of Algorithm J with N = 0xD (1101) and w = 4 by completing the following table:
i
Bit i
A + J(bit i)*N
A
-
-
-
0001
0
1
01110
0111
1
1
?
?
2
0
?
?
3
1
?
?
 J = 1011 = 0xB







Post Quantum Cryptography

The consensus appears that quantum computers will be built, and RSA and ECC will be broken by  Shor's algorithm  at some point. Although...