||A Study of Fast Modular Exponentiation
||Department of Computer Science and Information Engineering
||This thesis investigates how to accelerate the performance of modular exponen-tiation. As the requirement for secure communication over the Internet, 3G/4G cel-lular systems, vehicular communication networks and wireless sensor networks is growing, connecting small, mobile devices such as smart phones, tablet PCs, and smart cards to these networks is also a growing trend. Many of these devices require secure connections to ensure that the exchanged information remains confidential and that only authorized persons can access the devices. Such secure communication is established on many modern cryptographic functions, based on the framework of modular exponentiation. Modular exponentiation is composed of modular multipli-cation, which is made of modular reduction, large integer multiplication. In order to speed up modular exponentiation, this thesis proposes some strategies, including speeding up Montgomery reduction, determining the optimal split method and the threshold value so as to speed up the large integer multiplication, and modular reduction. Furthermore, this thesis ponders on reducing the times of memory accesses so as to make the large integer multiplication even more efficient. The goal of all strategies proposed is intended to accelerate the performance of modular exponentiation as far as possible, and make it possible to implement modern cryptographic protocols on mobile devices.
||Tables of Contents
CHAPTER 1. Introduction 1
CHAPTER 2. Related Works 14
2.1. Reducing the Numbers of Bits of the Exponent with the Value of “1” 15
2.2. Accelerating Modular Reduction 16
2.3. Accelerating a Large Integer Multiplication 20
2.3.1. Classical Multiplication 20
2.3.2. Karatsuba-Ofman Method 22
CHAPTER 3. Efficient Montgomery Reduction for Large Integers 26
3.1. Discussions of Montgomery Reduction Using the Product Scanning Method 26
3.2. The Efficient Method to Perform Large-Integer Montgomery Reduction 35
CHAPTER 4. The Optimal Split Method of Large Integer Multiplication 44
4.1. The Proposed Method 45
4.2. Analysis of the Proposed Method 48
4.2.1. Two Basic Methods 48
4.2.2. The Optimal Method 55
CHAPTER 5. Experiment Results 61
5.1. Architecture of TI DSP C55x Family 61
5.2. Experiment Results for Efficient Montgomery Reduction 64
5.3. Experiment Results for the Optimal Split Method 70
5.4. Experiment Results for Integrated Proposed Methods 74
CHAPTER 6. Conclusions and Future Researches 79
6.1. Summary of Contributions 79
6.2. Future Researches 82
List of Figures
Figure 1. The diagram of the classical multiplication 21
Figure 2. The diagram of product scanning method 27
Figure 3. Balanced-t-way split (at split depth 1) 49
Figure 4. Unbalanced-t-way split (at split depth 1) 52
Figure 5. Plot of the number of base instructions of recursive-balanced-t-way split (at split depth n) 57
Figure 6. Graphical representation of TMS320C55x 63
List of Tables
Table 1. RSA-based key agreement and key transport key length transitions (n is the size of the modulus) 12
Table 2. Recommended algorithms and minimum key sizes(L is the size of the public key, N is the size of the private key; k and f is the key size) 12
Table 3. Comparable strengths(L is the size of the public key, N is the size of the private key; k and f is the key size) 13
Table 4. The number of base instructions for two s-word classical multiplication 23
Table 5. The number of base instructions for a large integer addition/subtraction 25
Table 6. The number of classical mul., large integer add./sub. for the Karatsuba-Ofman method 25
Table 7. The number of base instructions for the Karatsuba-Ofman method 25
Table 8. Complexity of the three reduction algorithms in reducing a 2s-word number T modulo a s-word modulus m. 66
Table 9. Numbers of CPU cycles for implementing three Montgomery reductions 67
Table 10. Speedup of our proposed method vs. other Montgomery reductions (the other modular reductions/our proposed method) 67
Table 11. Numbers of CPU cycles for modular exponentiation with different modular multiplications where the bit lengths of base, modulus, and exponent are the same length 68
Table 12. Speedup of modular exponentiation with our proposed method vs. other modular multiplications where the bit lengths of base, modulus, and exponent are the same length (the time cost of other modular reductions/ the time cost of our proposed method) 69
Table 13. The computational time for one modular exponentiation with our new Montgomery reduction, where the bit lengths of base and modulus are 1024 on 100MHz CPU frequency 69
Table 14. The computational time for one modular exponentiation with our new Montgomery reduction, where the bit lengths of base and modulus are 2048 on 100MHz CPU frequency 70
Table 15. The number of CPU cycles for the large integer multiplication 72
Table 16. The number of CPU cycles for the multiplication with unbalanced methods 73
Table 17. The number of CPU cycles for integrated Montgomery modular multiplication and the proposed recursive-balanced-2way method 73
Table 18. The number of CPU cycles for modular exponentiation with different modular multiplications where the bit lengths of base, modulus, and exponent are the same length 74
Table 19. Speedup of modular exponentiation with the proposed method vs. other modular multiplications where the bit lengths of base, modulus, and exponent are the same length (the CPU Cycles of other modular reductions/ the CPU Cycles of the proposed method) 74
Table 20. The number of CPU cycles for our integrated Montgomery modular multiplication and the other Montgomery multiplications 76
Table 21. The number of CPU cycles for modular exponentiation with different modular multiplications where the bit lengths of base, modulus, and exponent are the same length 77
Table 22. Speedup of modular exponentiation with the proposed integrated method vs. other modular multiplications where the bit lengths of base, modulus, and exponent are the same length (the CPU Cycles of other modular reductions/ the CPU Cycles of the proposed method) 78
|| J. Grobschadl, R. M. Avanzi, E. Savas, and S. Tillich, "Energy-efficient software implementation of large integer modular arithmetic," CHES 2005, LNCS 3659, pp. 75-90, 2005.
 T. Simuni, "Energy Efficient System Design and Utilization," Ph.D. Thesis, Stanford University, Stanford, CA, USA, 2001.
 S. Ravi and A. Raghunathan, "Security in embedded systems: Design challenges," ACM Transaction on Embedded Computing Systems, vol. 3, pp. 461-491, 2004.
 R. Rivest, A. Shamir, and L. Adleman, "A method for obtaining digital signature and public-key cryptosystems," Communications of ACM, vol. 21, pp. 120-126, 1978.
 K. Koyama, UM. Maurer, T. Okamoto, and SA. Vanstone, "New public-key schemes based on elliptic curves over the ring Zn," presented at the CRYPTO'91, Santa Barbara, 1991.
 NIST, "FIPS 186-3: Digital signature standard(DSS)," June 2009, Available: http://csrc.nist.gov/publications/fips/fips186-2/fips186-2-change1.pdf.
 W. Diffie and M.E. Hellman, "New directions in cryptography," IEEE Transaction on Information Theory, vol. IT-22, pp. 638-654, 1976.
 H. Y. Lin, "Security and authentication in PCs," Computers and Electrical Engineering, vol. 25, pp. 225-248, 1999.
 S. Suzuki and K. Nakada, "An authentication technique based on distributed security management for the global mobility network," IEEE Journal on Selected Areas Communications, vol. 15, pp. 1608-1617, 1997.
 W. Geiselmann and R. Steinwandt, "Special-purpose hardware in cryptanalysis: The case of 1024-Bit RSA," IEEE Security and Privacy, vol. 5, pp. 63-66, 2007.
 T. Guneysu, C. Paar, and J. Pelel, "Special-purpose hardware for solving the elliptic curve discrete logarithm problem," ACM Transactions on Reconfigurable Technology and Systems, vol. 1, article no. 8, 2008.
 NIST, "Special Publications SP 800-57 Part 1 : Recommendation for key management: Part 1: General (Revision 3)," July 2012, Available: http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57_part1_rev3_general.pdf.
 NIST, "SP 800-57 Part 3 : Recommendation for key managemen: Part 3: Application-specific key management guidance," December 2009, Available: http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57_PART3_key-management_Dec2009.pdf.
 D. Boneh and H. Shacham, "Fast variants of RSA," Cryptobytes, vol. 5, pp. 1-9, 2002.
 D. E. Knuth, The Art of computer programming, Addison-Wesley, 1998.
 M. Joye and S. M. Yen, "Optimal left-to-right binary signed-digit recoding," IEEE Transactions on Computers, vol. 49, pp. 740-748, 2000.
 W. C. Yang, D. J. Guan, and C. S. Laih, "Algorithm of asynchronous binary signed-digit recoding on fast multi-exponentiation," Applied Mathematics and Computation, vol. 167, pp. 108-117, 2005.
 B. Qin, M. Li, F. Kong, and D. Li, "New left-to-right minimal weight signed-digit radix-r representation," Computers and Electrical Engineering, vol. 35, pp. 150-158, 2009.
 S. M. Yen, "Improved common-multiplicand-multiplication and fast exponentiation by exponent decomposition," IEICE Transaction on Fundamentals, vol. E80-A, pp. 1160-1163, 1997.
 C. L. Wu, "An efficient common-multiplicand-multiplication method to the Montgomery algorithm for speeding up exponentiation," Information Sciences, vol. 179, pp. 410-421, 2009.
 M. E. Kaihara and N. Takagi, "Bipartite modular multiplication method," IEEE Transactions on Computers, vol. 57, pp. 157-164, 2008.
 M. Knezevic, F. Vercauteren, and I. Verbauwhede, "Speeding up bipartite modular multiplication," Arithmetic of Finite Fields, LNCS 6087, pp. 166-179, 2010.
 J. H. Yang and C. C. Chang, "Efficient residue number system iterative modular multiplication algorithm for fast modular exponentiation," IET Computers and Digital Techniques, vol. 2, pp. 1-5, 2008.
 B. J. Phillips, Y. Kong, and Z. Lim, "Highly parallel modular multiplication in the residue number system using sum of residues reduction," Applicable Algebra in Engineering, Communication and Computing, vol. 21, pp. 249-255, 2010.
 P. L. Montgomery, "Modular multiplication without trial division," Mathematics of Computation, vol. 44, pp. 519-521, 1985.
 P. Barrett, "Implementing the Rivest, Shamir and Adleman public key encryption algorithm on a standard digital signal processor," CRYPTO ’86, LNCS 263, pp. 311-323, 1987.
 A. Bosselaers, R. Govaerts, and J. Vandewalle, "Comparison of three modular reduction functions," CRYPTO ’93, LNCS 773, pp. 175-186, 1993.
 H. Eberle, S. Shantz, V. Gupta, N. Gura, L. Rarick, and L. Spracklen, "Accelerating next-generation public-key cryptosystems on general-purpose CPUs," IEEE Macro, vol. 25, pp. 52-59, 2005.
 A. Liu and P. Ning, "TinyECC: A configurable library for elliptic curve cryptography in wireless sensor networks," Proceedings of the 7th international conference on Information processing in sensor networks (IPSN'08), pp. 245-256, 2008.
 C. Lederer, R. Mader, M. Koschuch, J. Grobschadl, A. Szekely, and S. Tillich, "Energy-efficient implementation of ECDH key exchange for wireless sensor networks," WISTP 2009, LNCS 5746, pp. 112-127, 2009.
 J. Grosschadl, "TinySA: A security architecture for wireless sensor networks," Proceedings of the 2006 ACM CoNEXT conference (CoNEXT '06 ), article no. 55, 2006.
 Y. Zhang and J. Grossch‥adl, "Efficient prime-field arithmetic for elliptic curve cryptography on wireless sensor nodes," Proceedings of the 1st International Conference on Computer Science and Network Technology (ICCSNT 2011), vol. 1, pp. 459-466, 2011.
 T. ElGamal, "A public key cryptosystem and a signature scheme based on discrete logarithm," CRYPTO ’84, LNCS 196, pp. 10-18, 1985.
 A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Handbook of applied cryptography, CRC Press, 1996.
 A. Karatsuba and Y. Ofman, "Multiplication of multidigit numbers on automata," Soviet Physics Doklady, vol. 7, pp. 595-596, 1963.
 P. Comba, "Exponentiation cryptosystems on the IBM PC," IBM Systems Journal, vol. 29, pp. 526-538, 1990.
 Texas Instruments, "TMS320C5510 Ultra Low Power DSP," Available: http://processors.wiki.ti.com/index.php/C5510.
 Texas Instruments, "OMAP (Open Multimedia Applications Platform)," Available: http://en.wikipedia.org/wiki/Texas_Instruments_OMAP/.
 Texas Instruments, "C5000 DSPs: architecture & peripheral features," Available: http://www.ti.com.
 C. K. Koc, T. Acar, and B. Kaliski, "Analyzing and comparing Montgomery multiplication algorithms," IEEE Micro, vol. 16, pp. 26-33, 1996.