Thursday, 17 July 2025

Post Quantum Cryptography

The consensus appears that quantum computers will be built, and RSA and ECC will be broken by Shor's algorithm at some point. Although scepticism on quantum computing exists (see Some critical publications on Quantum Computing), the possibility cannot be ignored. As a result, NIST formally issued a "Call for Proposals" for post-quantum cryptographic algorithms on December 20, 2016. and submissions for the first round were then due by November 30, 2017, and 69 complete and proper submissions were announced as first-round candidates on December 21, 2017.

SIKE

Around 2020, hopefuls were clear. Among of them, SIKE (Supersingular Isogeny Key Exchange) looked a very good candidate for its small key size and small ciphertext size. In 2021, I coded SIKE in my spare time. During the coding, I found the SIKE specification's algorithm for computing and evaluating a isogeny was quite complicated. However, if this algorithm coded recursively, the recursion not only made the code simple, but also made the concept of isogeny evaluation easy to understand.

I summarised my work and submitted a paper to PQC'2022 conference (see A Divide-and-Conquer Implementation of SIDH and its Relative Performance), hoping to share the experience and with relative performance figures to justify the viability of SIKE - the point was that, although SIKE was not as fast as other PQC candidates, its performance was comparable with the current standardised Diffie-Hellman system. Unfortunately, the paper was rejected with the reason of lack of innovation and the implementation's performance in terms of absolute performance. (BTW, I was a performance benchmark expert when I worked in Freescale (formerly Motorola Semiconductors) to benchmark their high-performance PowerPC products for EEMBC.)

I read through the feedback from the reviewers and realised the discrepancy. I planned to update this paper for other conferences. Considering some same reviewers could review the updated paper again, I prepared my comments as Comments on PQC2022 Rejection to explain the purpose of the paper. However, SIKE was broken just before NIST's Round 4 Submissions (see NIST Post-Quantum Cryptography Candidate Cracked). The effort for further work on SIKE stopped.

Unexpectedly, SIKE was broken by a method existed since 1997. This shows the the rarity of deep understanding on isogeny and the lack of experts in this difficult mathematics area. On a positive side, this effort has further extended my expertise in modern algebra - I resorted to the last page of my textbook on modern algebra.

FIPS 204 (ML-DSA) / CRYSTAL-Dilithium

After CRYSTAL-Kyber and CRYSTAL-Dilithium were standardised as FIPS 203 (ML-KEM) and FIPS 204 (ML-DSA) respectively, the focus was turned to lattice-based post quantum cryptography. FIPS 204 (ML-DSA) was coded to have a better understanding the FIPS 204 specification and to have a platform for further optimisation.

It is a full C Language implementation coded in spare time applied Montgomery arithmetic to NTT innovatively (see Applying Montgomery Multiplication to Implementing FIPS204 (ML-DSA) and here). A worked example, An FIPS204_ML-DSA-44 Worked Example (and here) is available for those who would like to cross check their implementations.

Sunday, 14 January 2024

Some critical publications on Quantum Computing

There are many news and publications on the advance of quantum computers, but there are some critical viewpoints.

It's very difficult to judge the impact of these criticisms for those who are not familiar with quantum mechanisms. I made a list below for reference. I'll give a brief description on those publications.

Worried about quantum breaking encryption? Give it 30 years • The Register – Shamir’s point on 2023 RSA Conference
The Quantum House Of Cards – Published in Jan 2024 on Quantum Computing in general. This paper is easy to read and follow, and makes sense. However, it's pessimistic on Quantum Computing. There are some discussions on this paper:

[2306.10072] Shor's Algorithm Does Not Factor Large Integers in the Presence of Noise (arxiv.org) – on practicability of Shor’s algorithm.
2303.11317.pdf (arxiv.org) – on practicability of Grover’s algorithm
Quantum Computers cross 1000 Qubits Threshold! What does this mean?
Quantum Physics Is on the Wrong Track, Says Breakthrough Prize Winner Gerard 't Hooft - I came across this interview and was wondering whether quantum computing is really a hype? This interview basically says that quantum mechanics is merely an explanation but not real.
Replication of Quantum Factorisation Records with an 8-bit Home Computer, an Abacus, and a Dog - it's sarcastic.

Somehow, I feel that for those criticisms, not many quantum computing supporters come out to debate with these sceptics on specific points. For example, to discuss Professor Gerard 't Hooft's vievpoint.

It would be very nice that we can hear more opinions on both sides.

I'll keep updating this page if I see something interesting,

Thursday, 4 January 2024

On ZKP - Zero-Knowledge Proof with Worked Examples

Saturday, 4 March 2023

Understanding Side Channel Attacks - a Mind Process

https://drive.google.com/file/d/1McEW87iTr4-dIDaCABTWuZYve7-lhLxN/view?usp=sharing

Sunday, 8 November 2020

A surprise rediscovery of a manuscript on the correctness proof of the Barrett algorithm

Life is a chance encounter, full of surprises. 😄

During my first RSA endeavor, I not only missed the Montgomery algorithm, but also missed the Barrett algorithm. What a shame! Later, I got a photocopy of the Montgomery algorithm and used it to implement elliptic curve cryptography, but I didn't have chance to implement the Barrett algorithm in full though I recall I did look into it in detail. Perhaps, the streamlined Montgomery algorithm became the mainstream implementation after the excellent paper by Burt Kaliski Jr. Most people just moved to efficient implementation of Montgomery multiplication.

Anyway, I'd totally forgotten what I did for the Barrett algorithm until a few days ago I incidentally found a photocopy of the paper sandwiched in the pages of my master degree thesis:

It was a photocopy from the Proceedings of Crypto'86 which was published in 1987. I think I got it in 1988 because it is with my old thesis. In page 318, it read as the proof of correctness of the algorithm, which made sure that at most two correcting subtractions were needed, was too complex to be provided in the paper (see that in the red rectangle):

Well, I found a proof by myself that time, written on a piece of dot matrix printer paper, very much yellowed with time having passed by:

With curiosity, I went through the manuscript written in 30 years ago and it is correct. It wasn't written down by chance, but I wasn't able to recall why started with y = [w/b^(n - 1)] * r. No clue at all.

Saturday, 4 July 2020

Montgomery Multiplication - worked numerical examples

1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

Montgomery multiplication works out xyR^–1 mod N without division, where R > N and gcd(R, N) = 1 on the condition of xy < RN.

In practice, N is chosen to be odd and R in form of 2ⁿ with R > N.

Montgomery multiplication: montMUL(x, y, N, R) = xyR^–1 mod N

Input: x < N

y < N

N, the modulus, odd

R, R > N and R in the form of 2ⁿ for an n.

Output: xyR^-¹ mod N

Procedure montMUL(x, y, N, R):

1. J = -N^–1 mod R; // can be pre-calculated see Algorithm J.

2. T = x*y ;

3. q = (T mod R)*J mod R;

4. t = (T + q*N)/R;

5. if (t ³ N)

6. return t - N;

7. else

8. return t ;

End_of_procedure

Because R is chosen as R = 2ⁿ, the divisions by R, at Line 3 and Line 4, are simply value extractions.

1.1.2. Numerical example

Given x = 5, y = 10, N = 13, calculate xy mod N with Montgomery arithmetic.

Step 1: We represent all the numbers in hexadecimal as follows:

x = 5;

y = A;

N = D;

Step 2: The R can be chosen R = 0x10. (note: the 10 is 16 in decimal)

Step 3: Calculate constants J and H as follows:

J = -N^–1 mod R = -D^-1 mod 10 = B.

H = R² mod N = 10 mod D = 9.

Step 4: Applying single-precision montMUL(x, y, N, R) as follows:

1. J = B // pre-calculated

2. T = x*y = 5*A = 3C.

3. q = (T mod R)*J mod R = (3C mod 10)*B mod 10 = C*B mod 10 = 84 mod 10 = 4.

4. t = (T + q*N) /R = (3C + 4*D)/10 = 70/10= 7 (< N)

\ xyR^–1 mod N = 7

Step 5: Calculating montMUL(xyR^–1 mod N, H) = xy mod N:

1. J = B; H = 9; // pre-calculated

2. T = 7*9 = 3F;

3. q = F*B mod 10 = A5 mod 10 = 5;

4. t = (3F + 5*D)/10 = 80/10 = 8;

\ xy mod N = 8.

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

In the multi-precision case of Montgomery multiplication, it is convenient to choose R = (2^w)ⁿ, where w is the number of bits of a CPU word and the n is the number of CPU words that the R occupies.

The following procedure works out xyR^-¹ mod N in multi-precision case for an odd N.

Multi-precision Montgomery multiplication: montMUL(x, y, N, R) = xyR^–1 mod N

Input: x[s-1, … , 0]

y[s-1, … , 0]

N[s-1, … , 0]

Output:

t[2s-1, … , s] = xyR^-¹ mod N

Local variables:

t[2s, 2s-1, … , s , s – 1, … , 0] // 2s + 1 words

c, d, and q, 1-word registers

Procedure: montMUL(x, y, N, R)

1. J = -N[0]^–1 mod 2^w // can be pre-calculated by Algorithm J. Note: here J is only one CPU word */

2. t[2s] = 0; t[2s – 1, … , 0] = x*y; // 2s + 1 words

3. for i = 0 to s –1 // from right to left

4. {

5. c = 0; // c is the carry

6. q = t[i]*J mod 2^w; // take the lower half

7. for j = 0 to s – 1

8. {

9. (c,d) = t[i +j ] + q*N[j] + c;

10. t[i + j] = d;

11. }

12. t[2s, … , i + s] = t[2s, … , i + s] + c;

13. }

14. if t[2s, 2s–1, … , s] > N // can be optimised

15. then t[2s, 2s–1, … , s] = t[2s, 2s–1, … , s] – N;

16. return t[2s–1, … , s]; // note the difference

End_of_procedure

1.2.2. Numerical example

Given x = 93, y = 167, N = 237 in decimals, calculate xy mod N with multi-precision Montgomery multiplication.

Pencil and paper: 93*126 mod 237 = 126

Step 1: We represent all the numbers in hexadecimal as follows:

x = 0x5D;

y = 0xA7;

N = 0xED; // N[1] = 0xE, N[0] = 0xD;

Step 2: The R is chosen R = 0x100. (note: 0x100 is 256 in decimal)

Step 3: Calculate constants J and H as follows:

J = -N[0]^–1 mod R = -D^-1 mod 10 = B;

H = R² mod N = 10000 mod ED = 7C;

Step 4: Applying multi-precision montMUL(x, y) as follows: (s = 2)

1. J = B; // pre-calculated

2. t[4,3,2,1,0] = (0, x*y) = (0, 5D*A7) = 03CAB

3. i = 0

4. { // the i = 0 loop

5. c = 0;

6. q = t[0]*J mod 10 = B*B mod 10 = 79 mod 10 = 9

7. j = 0

8. { the j = 0 loop

9. (c,d) = t[0 + 0] + q*N[0] + 0 = B + 9*D + 0 = 80

//c = 8 and d = 0 at this point;

10. t[0 + 0] = t[0] = d = 0;

11. } // j += 1; Line 8

8. {the j = 1 loop

9. (c,d) = t[0 + 1] + q*N[1] + 8 = A + 9*E + 8 = 90

// c = 9, d = 0

10. t[0 + 1] = t[1] = d = 0;

11. } // j+= 1; j = 2 now and exit of loop j

// t[4,3,2,1,0] = 03C00 now

12. t[4,3,2] = t[4,3,2] + c = 03C + 9 = 045;

// t[4,3,2,1,0] = 04500 now

13. } // i += 1 and go to Line 4

4. { i = 1

5. c = 0;

6. q = t[1]*J mod 10 = 0*J mod 10 = 0;

7. j = 0;

8. { the j = 0 loop

9. (c,d) = t[1] + 0 + 0 = 00;

// c = 0, d = 0;

10. t[1 + 0] = t[1] = d = 0;

11. } // j += 1 and go to Line 8

8. { j = 1 loop

9. (c,d) = t[2] + 0 + 0 = 05;

// c = 0, d = 5;

10. t[1 + 1] = t[2] = d = 5;

11. } // j += 1; j = 2, exits

// t[4,3,2,1,0] = 04500

12. t[4,3] = t[4,3] + c = 04 + 0 = 04;

13. } // i += 1. exit

// t[4,3,2,1,0] = 04500;

14. 045 < ED; // N = 0xED

15. // nothing

16. return t[3,2] = 45;

\5D*A7*R^–1 mod N = 0x45;

Step 5: By applying montMUL() again to achieve

montMUL(45,H) = montMUL(45,7C)= 0x7E. (126 in decimal)

1.3. J, the inverse of –N modulo 2^w

As we can see, an efficient algorithm is needed to work out the J in the above two algorithms. The J can be calculated either by a general Extended Euclidean Algorithm or by the Algorithm J below which takes advantage of R = 2^w.

Algorithm J: J = – N^–1 mod 2^w

Input: N

Output:

J = – N^–1 mod 2^w

Procedure: J = montJ(N)

1. A = 1; // A is a local variable

2. for i = 0 to w – 1 // lsb to msb

3. {

4. if (A is odd) then // the lsb == 0

5. J(bit i) = 1; // the i-th bit of J

6. else

7. J(bit i) = 0;

8. A = (A + J(bit i)*N)/2;

9. }

End_of_procedure

Correctness proof: use exists k, such that JN + 1 = k2^w .

In implementation, the bit i of J is right-shifted into J not by setting bits.

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

We loosely say an integer ring, Z_N or a GF(p), which we normally deal with, as normal domains. For an odd N, as well as p, and an R in the form of 2ⁿ, greater than N, we can make a 1-to-1 mapping as follows:

Given N and R, we have

1. montMUL(x, H) = xR mod N

2. montMUL(xR, 1) = x mod N

3. montMUL(xR, yR) = xyR mod N (Montgomery domain is closed)

4. montEXP(xR, k) = x^kR mod N (Montgomery exponentiation)

To perform the above, we need to calculate:

1. J = –N^–1 mod R

2. H = R² mod N

Sometimes, in Elliptic Curves Cryptography, we may need

U = R³mod N

which can be achieved by U = montMUL(H, H) = R³ mod N and used for

montMUL(x^–1R^–1, U) = x^–1R mod N

to adjust an inverse, achieved by the Extended Euclidean Algorithm, back to the Montgomery domain.

3. Further readings and studies

The following papers are recommended:

1. Peter L. Montgomery, "Modular multiplication without trial division". Mathematics of Computation. 44: 519–521, 1985

2. C. K. Koc, T. Acar and B. Kaliski, “Analyzing and Comparing Montgomery Multiplication Algorithms”, IEEE Micro, 16(3):26-33, June 1996

3. B. Kaliski, “The Montgomery inverse and its applications”, IEEE Transactions on Computers (Volume: 44 , Issue: 8 , Aug 1995

The above #1 is classic, which provides the theory.

The #2 is a good refence for implementation considerations. When a designer is given a CPU architecture, it would be very helpful to study this paper before coding.

The #3 provides an alternative for obtaining inverse in Montgomery domain to the method mentioned in section 2.2. Commonly used arithmetic.

It is recommended to make a worked numerical example of Algorithm J with N = 0xD (1101) and w = 4 by completing the following table:

i	Bit i	*A + J(bit i)N**	A
-	-	-	0001
0	1	01110	0111
1	1	?	?
2	0	?	?
3	1	?	?

J = 1011 = 0xB

Bo Lin

Thursday, 17 July 2025

Post Quantum Cryptography

Sunday, 14 January 2024

Some critical publications on Quantum Computing

Thursday, 4 January 2024

On ZKP - Zero-Knowledge Proof with Worked Examples

Saturday, 4 March 2023

Understanding Side Channel Attacks - a Mind Process

Sunday, 8 November 2020

A surprise rediscovery of a manuscript on the correctness proof of the Barrett algorithm

Saturday, 4 July 2020

Montgomery Multiplication - worked numerical examples

1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

1.1.2. Numerical example

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

1.2.2. Numerical example

1.3. J, the inverse of –N modulo 2^w

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

3. Further readings and studies

Post Quantum Cryptography

Report Abuse

Thursday, 17 July 2025

Sunday, 14 January 2024

Thursday, 4 January 2024

Saturday, 4 March 2023

Sunday, 8 November 2020

Saturday, 4 July 2020

1. Montgomery multiplication

1.1. Single-precision Montgomery multiplication

1.1.1. Description

1.1.2. Numerical example

1.2. Multi-precision Montgomery multiplication

1.2.1. Description

1.2.2. Numerical example

1.3. J, the inverse of –N modulo 2w

2. Montgomery arithmetic

2.1. Normal domain vs. Montgomery domain

3. Further readings and studies

1.3. J, the inverse of –N modulo 2^w