Elliptic Curve Digital Signatures in RSA Hardware
Transcription
Elliptic Curve Digital Signatures in RSA Hardware
Institutionen för systemteknik Department of Electrical Engineering Examensarbete Elliptic Curve Digital Signatures in RSA Hardware Examensarbete utfört i Kryptoteknik vid Tekniska högskolan vid Linköpings universitet av Martin Krisell LiTH-ISY-EX--12/4618--SE Linköping 2012 Department of Electrical Engineering Linköpings universitet SE-581 83 Linköping, Sweden Linköpings tekniska högskola Linköpings universitet 581 83 Linköping Elliptic Curve Digital Signatures in RSA Hardware Examensarbete utfört i Kryptoteknik vid Tekniska högskolan vid Linköpings universitet av Martin Krisell LiTH-ISY-EX--12/4618--SE Handledare: Jan-Åke Larsson isy, Linköpings universitet Pablo García Realsec, Madrid, Spanien Examinator: Jan-Åke Larsson isy, Linköpings universitet Linköping, 31 augusti 2012 Avdelning, Institution Division, Department Datum Date Division of Information Coding Department of Electrical Engineering SE-581 83 Linköping 2012-08-31 Språk Language Rapporttyp Report category ISBN Svenska/Swedish Licentiatavhandling ISRN Engelska/English Examensarbete C-uppsats D-uppsats — LiTH-ISY-EX--12/4618--SE Serietitel och serienummer Title of series, numbering Övrig rapport ISSN — URL för elektronisk version http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-81084 Titel Title Digitala signaturer över elliptiska kurvor på RSA-hårdvara Författare Author Martin Krisell Elliptic Curve Digital Signatures in RSA Hardware Sammanfattning Abstract A digital signature is the electronic counterpart to the hand written signature. It can prove the source and integrity of any digital data, and is a tool that is becoming increasingly important as more and more information is handled electronically. Digital signature schemes use a pair of keys. One key is secret and allows the owner to sign some data, and the other is public and allows anyone to verify the signature. Assuming that the keys are large enough, and that a secure scheme is used, it is impossible to find the private key given only the public key. Since a signature is valid for the signed message only, this also means that it is impossible to forge a digital signature. The most well-used scheme for constructing digital signatures today is RSA, which is based on the hard mathematical problem of integer factorization. There are, however, other mathematical problems that are considered even harder, which in practice means that the keys can be made shorter, resulting in a smaller memory footprint and faster computations. One such alternative approach is using elliptic curves. The underlying mathematical problem of elliptic curve cryptography is different to that of RSA, however some structure is shared. The purpose of this thesis was to evaluate the performance of elliptic curves compared to RSA, on a system designed to efficiently perform the operations associated with RSA. The discovered results are that the elliptic curve approach offers some great advantages, even when using RSA hardware, and that these advantages increase significantly if special hardware is used. Some usage cases of digital signatures may, for a few more years, still be in favor of the RSA approach when it comes to speed. For most cases, however, an elliptic curve system is the clear winner, and will likely be dominant within a near future. Nyckelord Keywords Cryptography, Digital Signatures, Elliptic Curves, ECC, ECDSA, ECIES, RSA Sammanfattning En digital signatur är den elektroniska motsvarigheten till en handskriven signatur. Den kan bevisa källa och integritet för valfri data, och är ett verktyg som blir allt viktigare i takt med att mer och mer information hanteras digitalt. Digitala signaturer använder sig av två nycklar. Den ena nyckeln är hemlig och tillåter ägaren att signera data, och den andra är offentlig och tillåter vem som helst att verifiera signaturen. Det är, under förutsättning att nycklarna är tillräckligt stora och att det valda systemet är säkert, omöjligt att hitta den hemliga nyckeln utifrån den offentliga. Eftersom en signatur endast är giltig för datan som signerades innebär detta också att det är omöjligt att förfalska en digital signatur. Den mest välanvända konstruktionen för att skapa digitala signaturer idag är RSA, som baseras på det svåra matematiska problemet att faktorisera heltal. Det finns dock andra matematiska problem som anses vara ännu svårare, vilket i praktiken innebär att nycklarna kan göras kortare, vilket i sin tur leder till att mindre minne behövs och att beräkningarna går snabbare. Ett sådant alternativ är att använda elliptiska kurvor. Det underliggande matematiska problemet för kryptering baserad på elliptiska kurvor skiljer sig från det som RSA bygger på, men de har en viss struktur gemensam. Syftet med detta examensarbete var att utvärdera hur elliptiska kurvor presterar jämfört med RSA, på ett system som är designat för att effektivt utföra RSA. De funna resultaten är att metoden med elliptiska kurvor ger stora fördelar, även om man nyttjar hårdvara avsedd för RSA, och att dessa fördelar ökar mångfaldigt om speciell hårdvara används. För några användarfall av digitala signaturer kan, under några år framöver, RSA fortfarande vara fördelaktigt om man bara tittar på hastigheten. För de flesta fall vinner dock elliptiska kurvor, och kommer troligen vara dominant inom kort. iii Abstract A digital signature is the electronic counterpart to the hand written signature. It can prove the source and integrity of any digital data, and is a tool that is becoming increasingly important as more and more information is handled electronically. Digital signature schemes use a pair of keys. One key is secret and allows the owner to sign some data, and the other is public and allows anyone to verify the signature. Assuming that the keys are large enough, and that a secure scheme is used, it is impossible to find the private key given only the public key. Since a signature is valid for the signed message only, this also means that it is impossible to forge a digital signature. The most well-used scheme for constructing digital signatures today is RSA, which is based on the hard mathematical problem of integer factorization. There are, however, other mathematical problems that are considered even harder, which in practice means that the keys can be made shorter, resulting in a smaller memory footprint and faster computations. One such alternative approach is using elliptic curves. The underlying mathematical problem of elliptic curve cryptography is different to that of RSA, however some structure is shared. The purpose of this thesis was to evaluate the performance of elliptic curves compared to RSA, on a system designed to efficiently perform the operations associated with RSA. The discovered results are that the elliptic curve approach offers some great advantages, even when using RSA hardware, and that these advantages increase significantly if special hardware is used. Some usage cases of digital signatures may, for a few more years, still be in favor of the RSA approach when it comes to speed. For most cases, however, an elliptic curve system is the clear winner, and will likely be dominant within a near future. v Acknowledgments First of all, I would like to thank Jesús Rodríguez Cabrero for allowing me to do my thesis at Realsec in Madrid. I would also like to thank my co-workers at Realsec, including my supervisor Pablo García, for the warm welcome to Spain and for giving me expert guidance. I specifically want to thank Luis Jesús Hernández for all our great debugging and discussion sessions. In addition, I would like to thank my examiner Jan-Åke Larsson for his interest in examining the thesis and for his valuable comments along the way. Finally, I would like to thank my friends and family who have helped me proofreading the thesis. Linköping, August 2012 Martin Krisell vii Contents 1 Background 1.1 Introduction . . . 1.2 Realsec . . . . . . 1.3 Purpose of Thesis 1.4 Outline of Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 2 2 Cryptography Overview 2.1 Basic Concepts . . . . . . . . 2.2 Historical Ciphers . . . . . . 2.2.1 The Caesar Cipher . 2.2.2 Substitution Cipher . 2.2.3 The Vigenère Cipher 2.3 Modern Cryptography . . . 2.4 Goals of Cryptography . . . 2.5 Attack Models . . . . . . . . 2.6 Bits of Security . . . . . . . . 2.7 Computer Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 8 8 9 10 10 11 12 12 3 Symmetric Cryptography 3.1 A Symmetric Cipher . . . . . . . . . . 3.1.1 The One Time Pad . . . . . . . 3.2 Security Definitions . . . . . . . . . . 3.2.1 Perfect Secrecy . . . . . . . . . 3.2.2 Semantic Security . . . . . . . 3.3 Stream Ciphers . . . . . . . . . . . . 3.4 Block Ciphers . . . . . . . . . . . . . 3.4.1 Psuedorandom Permutations 3.4.2 DES . . . . . . . . . . . . . . . 3.4.3 AES . . . . . . . . . . . . . . . 3.4.4 Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 14 15 15 15 16 18 18 18 19 19 I . . . . . . . . . . . . . . . . . . . . Theory ix x CONTENTS 3.5 Hash Functions . . . . . . . . . . 3.5.1 The Birthday Problem . . 3.6 Message Authentication Codes . . 3.6.1 CBC-MAC . . . . . . . . . 3.6.2 HMAC . . . . . . . . . . . 3.6.3 Authenticated Encryption . . . . . . . . . . . . 21 22 23 24 24 24 4 Asymmetric Cryptography 4.1 The Key Distribution Problem . . . . . . . . . . . . . . . . . . . . 4.2 Public and Private Keys . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Key Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Diffie-Hellman-Merkle Key Exchange . . . . . . . . . . . 4.4 Trapdoor Permutations . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Semantic Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 ElGamal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 RSA Encryption Standards . . . . . . . . . . . . . . . . . . 4.8 Hybrid Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Security of Public Key Algorithms . . . . . . . . . . . . . . . . . . 4.9.1 RSA Security . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9.2 Solving the Discrete Logarithm Problem . . . . . . . . . . 4.9.3 Shor’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . 4.10 Elliptic Curve Cryptography . . . . . . . . . . . . . . . . . . . . . 4.10.1 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.2 Elliptic Curves Over Finite Fields . . . . . . . . . . . . . . 4.10.3 Projective Coordinate Representations . . . . . . . . . . . 4.10.4 The Elliptic Curve Discrete Logarithm Problem (ECDLP) 4.10.5 Group Order . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.6 Domain Parameters . . . . . . . . . . . . . . . . . . . . . . 4.10.7 Elliptic Curve Key Pair . . . . . . . . . . . . . . . . . . . . 4.10.8 Encryption Using Elliptic Curves (ECIES) . . . . . . . . . 4.11 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.11.1 RSA Signatures . . . . . . . . . . . . . . . . . . . . . . . . 4.11.2 Digital Signature Algorithm (DSA) . . . . . . . . . . . . . 4.11.3 Elliptic Curve DSA (ECDSA) . . . . . . . . . . . . . . . . . 4.12 Public Key Infrastructure . . . . . . . . . . . . . . . . . . . . . . . 4.12.1 Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 27 28 28 30 30 31 32 33 33 33 34 37 38 39 39 42 43 45 45 46 47 47 49 50 52 53 55 55 . . . . 59 59 60 60 62 II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implementation and Performance Evaluation 5 Implementation 5.1 Hardware Security Module . . . . . . 5.2 Hardware . . . . . . . . . . . . . . . . 5.2.1 Montgomery Multiplications 5.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi CONTENTS 5.3.1 5.3.2 5.3.3 5.3.4 5.3.5 5.3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 65 67 70 72 6 Performance Evaluation of Implementation 6.1 Performance of HSM implementation . . . . . . . . . . . . 6.1.1 Key Pair Generation . . . . . . . . . . . . . . . . . . 6.1.2 Signature Generation . . . . . . . . . . . . . . . . . 6.1.3 Signature Verification . . . . . . . . . . . . . . . . . 6.2 Performance of Other ECDSA and RSA Implementations . 6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 73 74 75 77 78 79 A Mathematical Prerequisites A.1 Complexity Theory . . . . . . . . . . . . A.2 Number Theory . . . . . . . . . . . . . . A.3 Modular Arithmetic . . . . . . . . . . . . A.3.1 The Chinese Remainder Theorem A.3.2 Modular exponentiation . . . . . A.3.3 Multiplicative inverses . . . . . . A.4 Groups and Finite Fields . . . . . . . . . A.4.1 Generators and Subgroups . . . . A.4.2 The Discrete Logarithm Problem A.4.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 81 82 83 83 84 85 86 86 87 87 Bibliography Overall Code Structure . . . . . . . . . Layer 1 - Finite Field Arithmetics . . . Layer 2 - Point Addition and Doubling Layer 3 - Point Multiplication . . . . . Layer 4 - Cryptographic protocols . . . Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 xii CONTENTS 1 Background This chapter gives an introduction to this thesis, defining the background of, and the goals with, the performed work. An outline of the report is provided as well as definitions of commonly used abbreviations. 1.1 Introduction Cryptography is an invaluable tool today and its applications are growing continuously. Even though most implementations are transparent to the user, it is almost impossible to use a computer today without having to rely on cryptographic constructions. The purpose of using cryptography is, of course, to provide security. However, when deciding which scheme to use, security is not the only concern. Often a decisive factor in the choice between two or more constructions is their performance. This is especially important on the Internet where a server usually needs to be able to handle many simultaneous user requests, and where users generally are impatient. The performance of a cryptographic scheme is determined by two factors, the underlying algorithm and the implementation. The implementation can be done in either software or hardware, where the first one is simpler and more maintainable, but where the latter may give better performance as well as higher security. 1.2 Realsec Realsec is a Spanish company, based in Madrid, and has since 2001 been providing information security solutions to banks, governments and public organizations. Most of their cryptographic products are based on something called a 1 2 1 Background Hardware Security Module (HSM), which is a device that provides several security services such as encryption, digital signatures, and public key infrastructure. My thesis work was performed at the Madrid office and my implementation has been done on their new HSM module, not yet released to customers. 1.3 Purpose of Thesis The goal of this thesis was to evaluate the security and the performance of the elliptic curve based algorithms for digital signatures, and in particular against those based on RSA which is the by far most common choice today. Another goal derived from this was to create an implementation for Realsec that combines the elliptic curve algorithm with an efficient implementation, and to compare the performance with their existing, RSA based, scheme for generating digital signatures. The new elliptic curve implementation will be performed on the hardware that the current RSA implementation uses, and which is specialized for the RSA operations. The goal was to show that elliptic curves has some great advantages over the RSA approach, especially for the higher security cases, and that increased performance can be achieved without the need for new specialized hardware. 1.4 Outline of Report The basic structure of this thesis is that the first part, chapters 2-4, covers the theoretical foundations of cryptography, needed in order to understand the basics behind digital signatures. The second part, chapters 5-6, covers the performed implementation work and the results achieved. If the reader is already familiar with the theoretical parts of cryptography, it is possible to jump straight to the second part. Whenever specific details from the first part are used, they are referenced and the page number on which they appear is given. In order to not require too many prerequisites, there is an appendix covering the required mathematical background. These prerequisites are referred to in the text whenever needed. 1.4 Outline of Report Abbreviations ACP AES ANSI BAU CBC CDH CTR DES 3DES DH DLP DSA ECB ECC ECDH ECDLP ECDSA ECIES GF HMAC JP KDC KDF LD MAC MM NIST OAEP OTP PKCS PRG PRP RSA SHA Asymmetric Crypto Processor Advanced Encryption Standard American National Standards Institute Big-integer Arithmetic Unit Cipher Block Chaining Computational Diffie-Hellman Counter Mode Data Encryption Standard Triple DES Diffie-Hellman Key Exchange Discrete Logarithm Problem Digital Signature Algorithm Electronic Code Book Elliptic Curve Cryptography Elliptic Curve Diffie-Hellman Key Exchange Elliptic Curve Discrete Logarithm Problem Elliptic Curve Digital Signature Algorithm Elliptic Curve Integrated Encryption Scheme Galois Field Hash-based Message Authentication Code Jacobian Projective Key Distribution Center Key Derivation Function López-Dahab Projective Message Authentication Code Montgomery Multiplier National Institute of Standards and Technology Optimal Asymmetric Encryption Padding One Time Pad Public Key Cryptography Standards Pseudo Random Generator Pseudo Random Permutation Rivest Shamir Adleman Secure Hash Algorithm 3 Part I Theory 2 Cryptography Overview This chapter covers a brief overview of cryptography. Some historical examples of ciphers are provided, as well as a quick introduction to modern cryptography. In addition, the goals of cryptography and the different attack models are defined, as they will be referred to throughout this thesis. 2.1 Basic Concepts The basic purpose of cryptography is to allow two parties, often denoted by Alice and Bob, to talk to each other over an insecure channel while preventing an evil adversary, Eve, from understanding and participating. Alice and Bob utilize cryptographic constructions in order to transform the insecure channel into a secure one. The communication between the two parties may be taking place over space (e.g. over the Internet) or in time (e.g. for disc encryption). The original message is often referred to as the plaintext, and the transformed, unreadable, message is often called the ciphertext. The basic idea of cryptography can be visualized as in figure 2.1 on the following page. 2.2 Historical Ciphers The need for securing information has existed ever since humanity acquired the ability to write. Still, it is only very recently that cryptography started to be treated as a science, with constructions motivated by mathematical proofs. Up until the 20th century, the security of a cipher was based on the obscureness and more importantly on the secrecy of the used algorithm. After the method for encryption was released or reversed engineered, the cipher was always eventually 7 8 2 Cryptography Overview Hi Bob! Alice Bob Hi Alice! ?? Eve Figure 2.1: Basic overview of cryptography. broken. In this section, a quick overview of a few historical ciphers is given. More information about these, and the often very exciting stories surrounding them, can be found in Kahn [1]. 2.2.1 The Caesar Cipher The Caesar cipher is the simplest possible encryption scheme. All it does is shifting the plaintext letters individually three steps forward in the alphabet, such Wednesday, June 6, 12 that A → D, B → E, and so on. When the end of the alphabet is reached, it wraps around, i.e. X → A, Y → B, and Z → C. Described with mathematics, the letters in the alphabet are assigned numbers in order from 0 to 25, and encryption is done by performing addition by 3 mod 26 (for an introduction to modular arithmetic, see Appendix A.3 on page 83). Decryption is then of course done by performing subtraction by 3 mod 26. This is in fact not a cipher at all, since there is no key (the definition of a symmetrical cipher is given in chapter 3 on page 13). However, usually any cipher that is defined by adding a number n mod 26 is referred to as a Caesar cipher. This cipher is trivially broken by simply trying all 26 possibilities, an attack method referred to as exhaustive search or as a brute force attack. 2.2.2 Substitution Cipher In the previous example, the size of the key space (i.e. the number of keys) was small enough to make the exhaustive search approach practical. In a substitution cipher, instead of simply allowing rotations of the alphabet, each letter may map to any other letter. An example of such a mapping is given in figure 2.2 on the next page. Now, the key is given by an ordering of the letters in the alphabet, and since this can be done in 26! different ways, the key space is much larger than before. It turns out that this key space is large enough to prevent any exhaustive search une 6, 12 2.2 Historical Ciphers 9 ABCDEFGHIJKLMNOPQRSTUVWXYZ BMCDAEOFGHRKYJZLPNQVUWSXIT Figure 2.2: An example of a substitution cipher key. attacks, but the substitution cipher is still not secure. The reason is that not all letters in a message are equally common. The most frequent letter in the ciphertext is very likely to be the most frequent letter in the given language, e.g. ’e’ in English. By performing a so-called statistical analysis of letters, letter pairs, etc., a substitution cipher can always be broken. This means that a large key space is a necessary, but not sufficient, requirement for a cipher in order to be secure. 2.2.3 The Vigenère Cipher The problem with the substitution cipher is that the same plaintext letter will always be encrypted into the same ciphertext letter, thus keeping all letter frequencies from the original message. The Vigenère cipher was designed to fix this. It uses a key of any length, and then repeats it until it is as long as the message. The encryption is then done similarly to the Caesar case by performing modular addition of each message character to the corresponding character in the expanded key. This means that a repeated plaintext character may be transformed into a different ciphertext character, depending on its relative position to the key. An example of the Vigenère Cipher is given in figure 2.3 on the following page. This cipher was initially thought to be unbreakable, but further insight made it almost as easy to break as a normal substitution cipher. The problem with this cipher is that as soon as the attacker knows the length of the key, i, we can pick every i’th character in the ciphertext and since these have been encrypted by the same key character, we can perform statistical analysis on these letters. By repeating this for every position, we can retrieve the entire message. If we don’t know the size of the key, we can try to figure it out by looking for repeated patterns in the ciphertext, or by simply trying different lengths. The above approach to break the Vigenère cipher works because the key is repeated. However, if the key is long enough (or if the message is short enough), it will not work. In that case, the only potential problem would be the way the key characters are chosen. If the characters in the key are chosen truly at random (uniformly) over the alphabet, then every plaintext letter is transformed into each other letter with equal probability, and no statistical analysis will be able to break this cipher. We will keep this idea in mind and return to it in the beginning of the chapter on symmetric cryptography. 10 2 m k HELLO BOB I LOVE YOU Cryptography Overview + mod 26 KEYKE YKE Y KEYK EYK RIJVS ZYF G VSTO CME c Notice that the two “b”s in Bob are encrypted to different ciphertext letters Figure 2.3: An example of a Vigenère cipher in use. 2.3 Modern Cryptography The first seed towards what is today referred to as Modern Cryptography was planted by Auguste Kerckhoff in 1883, when he published two articles proposing a few requirements for all future encryption schemes [2]. The most important one tackles the previous security by obscurity approach by stating that a cryptographic construction should be secure even if all details about the algorithms fall into the wrong hands. Only one small parameter should be needed to be kept secret, referred to as the key. This idea changed cryptography completely, and transformed it from an art form into being a science. Today, ciphers are developed openly and scrutinized by the cryptographic community before being widely accepted. New cryptographic standards are sometimes decided through competitions where different cryptogWednesday, June 6, 12 raphy research teams try to break each other’s constructions. The believed security of a cipher increases over time as more and more effort is put into breaking it. Also, a new kind of cryptography has recently been invented, called public key cryptography. Its constructions are often based on more or less complex mathematics and are usually accompanied by formal proofs of security. The main practical difference is that the encryption and decryption keys are no longer the same, and that besides solving the key distribution problem (discussed in chapter 4), it also permits some new interesting applications of cryptography. 2.4 Goals of Cryptography One of the main objectives when using cryptography is of course to prevent sensitive information from falling into the wrong hands. This is, however, not the only 2.5 Attack Models 11 objective. Equally important, and even more important in some cases, is to be certain of who actually sent the data and also that it has not been modified during transmission. The four main objectives of cryptography, ordered alphabetically, are the following. • Authentication Providing assurance that the expected sender of the message is also the true sender. • Confidentiality Preventing unintended recipients from reading the message. • Integrity Preventing undetected modification during transmission. • Non-repudiation Ensuring that the sender of an authenticated message can never deny that they sent the message. When describing cryptographic constructions throughout the rest of this thesis, it will be specified which of these goals the specific construction is intended to provide. 2.5 Attack Models An attack model specifies the amount of information that the adversary has access to when trying to break encryptions. In the simplest case, the adversary, Eve, knows nothing about what kind of data is being sent over the channel, and can only see the ciphertext. In this case, Eve will only be able to mount a so-called ciphertext only attack. This is the weakest kind of attack and any system that is vulnerable to this kind of attack (such as all of the historical ciphers mentioned earlier) should never be used. There are cases when the adversary may know all or some of the plaintext that was used to generate some specific ciphertext. For example, an email message always starts with "From:" and so even if the contents of the message are unknown, some information about the plaintext is known. In this case, Eve will mount a known plaintext attack. Finally, it may be the case that the adversary actually gets to choose the messages being encrypted, and then uses the plaintext and ciphertext pair in order to extract information from other ciphertexts. An example could be disc encryption, where the attacker somehow can affect the files being stored in the system, e.g. by sending an email with an attachment. This is called a chosen plaintext attack. Generally, we try to provide security against all these models, since we usually can’t assume anything about the attacker and thus have to assume the worst. We will in the next chapter give a formal definition of security, and see that any system fulfilling it will be secure also against chosen plaintext attacks. 12 2.6 2 Cryptography Overview Bits of Security As previously stated, a minimal requirement for a cryptographic system to be secure is that the key space is large enough to prevent attacks by exhaustive search. If it is not, the attacker can simply try every possible key until the decrypted message makes sense (except for one case, described in the next chapter). For a perfectly built encryption scheme, exhaustive search would be the only possible attack, and a large key space would also guarantee security. Real systems, however, might have weaknesses that enable for a faster way to break them. When we talk about the bits of security for a cryptographic scheme, we mean the number of bits that corresponds to a key space where exhaustive search would take the same amount of time as the best know algorithm to break the specific scheme does. For example, if the best known attack in order to break scheme A n runs in time Θ(2 2 ) (see Appendix A.1 on page 81 for asymptotic notation) where n is the size of the key in bits, then A has n2 bits of security. The National Institute of Standards and Technology (NIST) is a federal agency in the United States that, among other things, gives security recommendations. Up until 2010, they recommended using cryptographic system with a minimum of 80 bits of security, but since 2011, the recommendation is at least 112 bits. This is supposed to provide security until 2030 and after that, 128 bits is the minimum recommendation [3]. When choosing a construction, the required security not only depends on the time the construction will be used, but also on the time the encrypted information need to be kept secret. A message valid for only a few minutes, e.g. a one-time login code, does not need the same security as a message valid for years. 2.7 Computer Security One note has to be made about the relation between cryptography and computer security, since these two are sometimes incorrectly considered equivalent. Cryptography is only a subset of computer security. Not all computer related security problems can be solved by cryptography. Also, even if a theoretically secure cryptographic construction is being used, it is not necessarily secure in practice. One such example is so-called side-channel attacks, which are attacks not on the theoretical constructions but rather on the implementation, where e.g. the power usage or the time for encryption is measured and used to deduce information about the secret key or the message being encrypted. An example of such an attack is given in Kühn [4]. Another attack that cryptography cannot defend against is a replay-attack (or playback attack), where the adversary records a complete encrypted message and then replays this transmission at a later time. The attacker may not know what the message says, but any effect it has upon reception may be triggered again, thus clearly posing a possible security risk. In order to protect against this, additional measures must be taken, such as sequence numbering or time stamping. 3 Symmetric Cryptography This chapter gives an overview of symmetric key encryption schemes, and also gives a few examples of the most well-known constructions. Note, however, that the focus of this thesis is on asymmetric cryptography so this chapter will be fairly brief and only cover enough to understand the possibilities and problems of cryptography. For a much deeper and more thorough treatment of symmetric cryptography, see Menezes, Oorchot, and Vanstone [5]. 3.1 A Symmetric Cipher A symmetric cipher provides a way of transforming a plaintext message into so called ciphertext, by using a key. Anyone who has the key can use it to get back the original message from the ciphertext. We call this a symmetric scheme because the same key is used for both encryption and decryption. Throughout this chapter, we assume that the two parties communicating already have been able to share a secret key. The problem of obtaining this key is discussed in the next chapter. An overview of a symmetric encryption scheme is given in figure 3.1 on the following page. A symmetric cipher can be mathematically defined in the following way. 3.1 Definition (Cipher). Let K be the set of all keys, M be the set of all plaintext messages and C be the set of all ciphertext messages. A symmetric cipher defined over (K, M, C) is a pair of algorithms (E, D) where E : K × M → C and D : K × C → M and where ∀k ∈ K, ∀m ∈ M : D(k, E(k, m)) = m. 13 14 3 Symmetric Cryptography A)=#FEJ E(k, m) k D(k, c) Hi Bob! Eve Hi Bob! Alice k Bob Figure 3.1: Overview of symmetric cryptography. We have already seen a few examples of symmetrical ciphers, in the historical overview. Note that there is no notion of security in the definition of a cipher. We will soon give some security definitions, but first we will formalize our discovery in the previous chapter when discussing the Vigenère cipher. 3.1.1 The One Time Pad The One Time Pad (OTP) is a cipher that works similarly to the Vigenère Cipher, but instead of repeating a short key, we require that the key is at least as long as the message. This means that every letter in the message will be encrypted by a different key character and if those are chosen independently at random, the Wednesday, June 6, 12 original contents of the message will be completely hidden. The definition of the one time pad over a binary alphabet is given below, but an equivalent definition can be given for any alphabet. 3.2 Definition (One Time Pad). The One Time Pad cipher is the pair (E, D) defined over (K = {0, 1}n , M = {0, 1}n , C = {0, 1}n ), where for k chosen uniformly at random from K, m ∈ M, c ∈ C : E = m ⊕ k and D = c ⊕ k.1 Used correctly, this cipher intends to provide confidentiality, and not any of the other goals with cryptography that was discussed in the previous chapter. Note that the cipher is called the one time pad for a very good reason. If you ever use the same key for two different messages, the adversary can simply add the two ciphertexts together (mod 2) and the key will be eliminated, leaving the XOR of the two plaintext messages. There is enough redundancy in the written language to extract the two messages from this. So, a key for the one time pad must only be used for one message. However, even if the OTP is used correctly, it is a very impractical cipher since the key needs to be at least as long as the message. We need a secure way to transfer this key and if we already have this, that method could be used to transmit the message directly instead. Using the OTP as described here 1 (⊕ = XOR = addition mod 2). 3.2 15 Security Definitions only makes sense if the two parties can meet in advance and exchange a large amount of keys, to use for future communication.2 3.2 Security Definitions In order to be able to talk about the security of a cryptographic construction, we first need to define what we mean when we say that a cipher is secure. In this section we give two such definitions of security. 3.2.1 Perfect Secrecy After having invented the information theory in the 1940s, Claude Shannon used these ideas within the area of cryptography and came up with something referred to as perfect secrecy [7]. One way of defining this is as follows. 3.3 Definition (Perfect Secrecy). A cipher (E, D) over (K, M, C) has perfect secrecy if ∀m0 , m1 ∈ M, where m0 and m1 has equal length, and ∀c ∈ C Pr [E(k, m0 ) = c] = Pr [E(k, m1 ) = c] where k is chosen uniformly at random from K. The definition says that given a ciphertext, the probability that a specific plaintext generated this ciphertext is the same for all plaintexts of equal lengths, i.e. there is no way to determine the original message. Not even exhaustive search can break this system, regardless of the key space size, since there is no way to tell when the original message is found. It is very easy to prove that the one time pad in fact has perfect secrecy. However, it is also easy to prove that the inconvenience of the one time pad, the fact that |K| ≥ |M|, i.e. that the key must be as long as the message, is in fact a requirement for perfect secrecy. A cipher that have keys shorter than the messages can never be perfectly secure. This makes perfect secrecy a very impractical definition. 3.2.2 Semantic Security Our second definition of security, that instead of requiring perfect secrecy in a information theoretical way, requires only that the cipher is secure enough to be unbreakable by any "efficient" adversary, i.e. one running in polynomial time (see Appendix A.1 on page 81). We define semantic security when using the same key for multiple encryptions, also sometimes referred to as indistinguishability under a chosen plaintext attack (IND-CPA), as follows. 3.4 Definition (Semantic Security). Semantic security is defined through a game between a challenger and an adversary, through these steps: 2 This approach was actually used for the Moskow-Washington hotline, where diplomats exchanged a large amount of pad to be used in some specific order [6]. 16 3 Symmetric Cryptography 1. The challenger chooses a random key from the key space, and also a random number b ∈ {0, 1}. 2. The adversary gets to submit any number of plaintext messages to the challenger, and the challenger sends back an encryption of these under the chosen key. 3. The adversary generates m0 and m1 , of equal length, of his choice, and sends these to the challenger. 4. The challenger returns the encryption of mb . The used cryptosystem is said to have Semantic Security if no "efficient" adversary can determine which of the two submitted messages was returned, with probability significantly greater than 21 (the probability achieved if the adversary just guesses). It is easy to realize that this definition prevents an adversary from learning any information about the plaintext, thus achieving semantic security is the goal for all cryptographic constructions intended to provide confidentiality. Note that m0 and m1 may very well be one of the previously submitted messages. This means that a system which always transforms the same plaintext into the same ciphertext can never be semantically secure. Instead, the encryption algorithm needs to be randomized, meaning that in addition to the key and the plaintext, it also takes bits from some random source as input, such that the output is different even if the input is the same. Decryption, however, need to be deterministic since it should always return the same plaintext when decrypting a ciphertext. 3.3 Stream Ciphers We have seen the good security properties of the one time pad, but also the impracticality of using it. We also know that there is no way to achieve perfect secrecy unless the key is as long as the message. The question is if it is possible to instead achieve semantic security by using the same idea as the one time pad, but with a shorter key. This is what stream ciphers are trying to do. The idea is to use an expansion function, that takes as input a short truly random key, and creates a longer pseudorandom keystream. We call the expansion function a Pseudorandom Generator (PRG), defined as follows. 3.5 Definition (PRG). A Pseudorandom Generator is a function G : {0, 1}n → {0, 1}s where s n. Now, one definition of a stream cipher could be like this. 3.6 Definition (Stream Cipher). A stream cipher is a pair (E, D) defined over (K = {0, 1}n , M = {0, 1}s , C = {0, 1}s ), where G : {0, 1}n → {0, 1}s is a PRG and k is chosen uniformly at random from K, m ∈ M, c ∈ C : E = m ⊕ G(k) and D = c ⊕ G(k). 3.3 17 Stream Ciphers A concept illustration of a stream cipher is given in figure 3.2. E(k, m) k 100110100 G(k) Alice 1000010100101110101011000010101 m 1010111010101001010100101010011 ⊕ 0010101110000111111111101000110 c Figure 3.2: An example of stream cipher encryption. In order for this stream cipher to have any chance of being secure, the used PRG must be secure. Security in this case is equivalent to unpredictibility, meaning that with random looking input, sometimes called the seed, the result must also look random such that given some part of the output from the PRG, it is impossible to predict any future output. It has in fact been proven that the above construction, using a secure PRG, gives a stream cipher that is semantically secure as long as the key is used only once. That means that we modify the previous definition of semantic security to omit the second step. The problem is that no one knows if it is possible to construct secure PRGs, however there are some good candidates such as Salsa20, which is part of the ECRYPT3 Stream Cipher Project Wednesday, June 6, 12 [8]. Like the one time pad, a stream cipher is trying to achieve confidentiality. It is easy to realize that it does not provide any integrity at all, and also that ciphertext modifications not only go undetected, but also that they have a known effect on the plaintext because of the linearity of addition. In order to achieve integrity, a stream cipher must be accompanied or replaced by other constructions. History has showed that it is hard to implement stream ciphers securely, and many systems that have used stream ciphers have eventually been broken. The recommendation is therefore to instead use a standardized block cipher, defined in the next section. 3 ECRYPT is the European Network of Excellence for Cryptology, project launched to increase informational security research collaboration in Europe. 18 3 3.4 Symmetric Cryptography Block Ciphers Up until now, we have only seen constructions that encrypt each symbol in the alphabet individually. A construction that instead encrypts a block of plaintext into a block of ciphertext is called a block cipher. It is hard to make block ciphers as fast as stream ciphers. However, they may be more secure and more importantly, they will give us new capabilities, explained later in this chapter. Before giving examples of block ciphers, we will first define the abstract idea of a secure pseudorandom permutation. 3.4.1 Psuedorandom Permutations A Pseudorandom Permutation (PRP) is simply described an invertible function that is "efficiently" calculated in both the forward and the backward direction. 3.7 Definition (PRP). A Pseudorandom Permutation is a function F defined over (K, X ), F : K × X → X such that ∀k ∈ K and ∀x ∈ X : 1. There exists an "efficient" algorithm to evaluate F(k, x). 2. The function E(k, · ) is one-to-one. 3. There exists an "efficient" algorithm to inverse F(k, x). We say that a PRP is secure if no "efficient" adversary can distinguish between that function and a completely random function defined over the same space. Formally, the security is defined by a game similar to the one used for defining semantic security. The adversary submits input values and the challenger returns the output of either a random function or a PRP, and the PRP is secure if the adversary cannot tell which he received. This means for example that changing just one bit in the input should flip every bit in the output with probability 12 , since this is what a random function would do. This property is sometimes referred to as diffusion, or as the avalanche criterion. When discussing modes of operations below, we will see that using a secure PRP, we can achieve semantic security as previously defined. Hence, all that block ciphers are trying to do, is to behave like a secure PRP. However, like the case for PRGs, no one knows if it is possible to construct secure PRPs, but there are constructions that are believed to be close, such as for example AES. 3.4.2 DES The Data Encryption Standard (DES) was released in 1977 and was the first block cipher to be standardized by NIST. It has a block size of 64 bits and a key size of 56 bits. The short key size was criticized from the beginning and allows for exhaustive search attacks. Also, the short block size makes better attacks possible. Today, DES can be easily broken and should never ever be used. The algorithm still lives in the form of 3DES, which is simply DES applied three times with different keys. The usual way to do this is not to perform three consec- 3.4 Block Ciphers 19 utive DES encryptions, but rather encryption, decryption, and encryption, using the three keys respectively. The reason for the middle decryption is that when letting all three keys be the same, these steps reduce to normal DES, only slower. 3DES solves the problem of the short key and is considered a secure construction, although very slow. 3DES should therefore only be used for backward compatibility and all new systems should preferably use AES. The inner workings of block ciphers are complicated and will not add much value to this thesis, and therefore these are not described here. For more information about how to build block ciphers, see Menezes et al. [5]. 3.4.3 AES The Advanced Encryption Standard was elected as the new standard in 2002, after a competition that was won by the cipher Rijndael. It has a block size of 128 bits, and key sizes of 128, 192, or 256 bits. Some attacks that reduce the number of bits of security exists, the best one for AES-256 which gives it only 99.5 bits of security for some special cases, as described in Biryukov and Khovratovich [9]. This is, however, still too much for exhaustive search and AES is today considered to be a secure block cipher. 3.4.4 Modes of Operation Block ciphers only act on messages that have a size of one block. If the message is shorter, padding is applied. However, if a message is longer than one block, we have to specify how to utilize the block cipher in order to encrypt this message. This is called the mode of operation, and there are several with different advantages and disadvantages. Note that the purpose of these are still only to provide confidentiality. Some combinations of mode and block cipher may achieve other goals as well, e.g. integrity, but these should in general not be relied on. If more than confidentiality is intended, additional constructions, soon to be explained, should be used to provide this. ECB In Electronic Codebook (ECB) mode, each block of plaintext is encrypted individually into ciphertext, as shown in figure 3.3 on the next page. One way to look at this is that a simple table, a codebook, can be used to lookup what ciphertext the different plaintext blocks will be transformed into. When using ECB-mode, the same plaintext block will always generate the same ciphertext block, and so it is easy to realize that this is not semantically secure (it is not randomized), and therefore ECB mode should never be used. CBC There are two weaknesses with ECB mode that the Cipher Block Chaining (CBC) mode tries to fix. First of all, ECB is deterministic. In order to fix this, we will use something called an Initialization Vector, IV4 , which is simply a value chosen 4 Sometimes the word "nonce" (number used once) is used interchangeably with IV. However, a nonce need in general not be random, only unique, i.e. a counter suffices for nonces but not for IVs. 20 3 Symmetric Cryptography ECB m m0 m1 m2 m3 m4 E(k, m0) E(k, m1) E(k, m2) E(k, m3) E(k, m4) c0 c1 c2 c3 c4 k c Figure 3.3: ECB mode of operation. uniformly at random from some space, in this case over one block. Moreover, in ECB, a repeated plaintext block leads to a repeated ciphertext block, also eliminating any chance of being semantically secure. In CBC mode, each ciphertext block does not just depend on the key and the plaintext, but also on the previous block of ciphertext. For the first block, we use the IV in place of the previous block. This means that even if the same plaintext block is repeated, the ciphertext blocks will differ. The IV is then sent in clear along with the ciphertext. Figure 3.4 Wednesday, June 6, 12 on the next page describes the CBC mode of operation. Given that the IV is chosen truly at random, and that the used block cipher is a secure PRP, CBC mode is semantically secure. CTR In Counter (CTR) mode, the block cipher is actually not applied to the plaintext at all. Instead, it is used to create a key stream that is then XORed to the plaintext, much like how stream ciphers work. We know since before that in order for a stream cipher to be secure, the same key stream can never be used more than one. The way we achieve this here is to use the block cipher to encrypt an IV, that is chosen at random for each message, concatenated with a counter, that is increased for each block. If this is done properly, CTR mode can also be proven to be semantically secure, of course assuming that used block cipher is a secure PRP. The operation of counter mode is showed in figure 3.5 on page 22. Note that for counter mode, the different blocks in the keystream can be calculated in parallel, and even before the message is known. This makes counter mode much more efficient than CBC mode. Examples of additional modes are cipher feedback mode and output feedback mode. More information about these can be found in Menezes et al. [5]. 3.5 21 Hash Functions CBC m k IV m0 m1 m2 m3 m4 E(k, m0 + IV) E(k, m1 + c0) E(k, m2 + c1) E(k, m3 + c2) E(k, m4 + c3) c0 c1 c2 c3 c4 c Figure 3.4: CBC mode of operation. 3.5 Hash Functions A hash function is a deterministic function that takes an arbitrary length input and produces a fixed length output. Hash functions are not ciphers at all, since they use no key, however they are covered here since they are very useful in many cryptographic constructions. Note that hash functions are used also outside the area of cryptography, and then usually with much less strict requirements. In this thesis, only cryptographic hash functions are discussed. Wednesday, June 6, 12 3.8 Definition (Cryptographic hash function). A cryptographic hash function is a function, F : {0, 1}∗ → {0, 1}n where n is the size of the output, that has the following properties. Preimage Resistance The function must be one-way, i.e. given h, it should be difficult to find any m such that F(m) = h. Second Preimage Resistance Given m0 , it should be difficult to find any m1 such that F(m0 ) = F(m1 ). Collision Resistance It should be difficult to find any pair m0 , m1 (m0 , m1 ) such that F(m0 ) = F(m1 ). Just as the case with secure PRGs and secure PRPs, no one knows if it is possible to construct cryptographic hash functions. Examples of famous current constructions are MD5, SHA-1, and SHA-2, who all use Merkle-Damgård construction [10]. Only the latter of these are considered secure today. However, there is an ongoing hash function competition, issued by NIST, ending in 2012. The winning hash function will be called SHA-3 and will be the new standard, intended 22 3 Symmetric Cryptography CTR m k m0 m1 m2 m3 m4 ⊕ E(k, IV + 0) E(k, IV + 1) E(k, IV + 2) E(k, IV + 3) E(k, IV + 4) c0 c1 c2 c3 c4 c Figure 3.5: CTR mode of operation. to remain secure for a long time. The hardest property to fulfill for cryptographic hash functions is collision resistance. When it is possible to violate this property, we say that we found a collision. Finding one by using brute force is, however, not as hard as it first might seem. Wednesday, June 6, 12 3.5.1 The Birthday Problem Assume that we have a standard school class with 30 students, what is the probability that two of them share the same birthday? Intuition tells us that this probability should be fairly low, since there are a lot more days than students, but the mathematics tells us otherwise. The probability of a collision is the complement of the probability of no collision. For no collision to occur, the first student can be born on any day, the second on any of the remaining days, and so on for all Q 365−k students. So, Pr [Two with same birthday] = 1 − 29 k=0 365 = 0.706. That is, in a class with 30 students, the probability that two share the same birthday is over 70 %, and the probability is over 50 % for only 23 students. These numbers hold if birthdays are uniformly distributed over the year. In reality, they are not, which makes the probability for collision even higher. In general, it can be proven that the number of calculations that need√to be √ performed in order to find a collision among N elements is approximately 2 ln 2 N . This result is important for hash functions, and as we will soon see also in other cryptographic constructions, since this tells us that the maximum number of bits of security that can be achieved when collisions need to be avoided is half the size of the output. Using this fact to attack a system is called a birthday attack. 3.6 3.6 Message Authentication Codes 23 Message Authentication Codes We have so far only discussed how to achieve confidentiality in a system, which is enough to make a system secure when the adversary only has eavesdropping capabilities. In the real world, however, the adversary can be active and modify the contents of a message during transmission. We will now see how we can provide other goals, namely integrity, authentication, and in some sense non-repudiation. Many network protocols utilize some kind of checksum to detect errors during transmission. These are, however, intended to detect random errors, and a malicious adversary can easily make undetectable modifications. A Message Authentication Code (MAC) is, just like a hash function, a short digest of a message, but in addition to the message it also takes a key as input.5 Usually, the output is called a tag. The tag is calculated and sent together with the message and upon reception, the MAC is verified, usually by simply calculating the tag again and comparing the two. The goal is that only those with access to the correct key will be able to create tags that verify. We define a MAC as follows. 3.9 Definition (MAC). A Message Authentication Code, defined over (K, M, T ) is a pair of algorithms (S, V ) where S : K × M → T and V : K × M × T → {"yes", "no"} and where ∀m ∈ M, ∀k ∈ K : V (k, m, S(k, m)) = "yes". The definition of a secure MAC is similar to the definition of semantic security for a cipher. 3.10 Definition (Secure MAC). The security of a MAC under a chosen plaintext attack is defined through a game between a challenger and an adversary, through these steps: 1. The challenger chooses a random key from the key space. 2. The adversary gets to submit any number of plaintext messages to the challenger, and for each the challenger sends back a tag for the message under the chosen key. 3. The adversary sends a message tag pair, (m, t), not equal to any of the previously created pairs. 4. The challenger runs the verification algorithm on (m, t). The MAC used is said to be secure, or existentially unforgeable, if for all "efficient" adversaries the output of the verification is "yes" with negligible probability. By using a secure mac, an attacker will not be able to modify anything in the message without being detected by the verification algorithm, i.e. MACs provide message integrity. This also means that if a message is received, along with a MAC that verifies, the receiver can be certain of who sent the message, assuming 5 MACs are indeed sometimes called "keyed hash functions." 24 3 Symmetric Cryptography that only one other person has access to the secret key, so we also achieve authentication. Between the two, we can also argue that non-repudiation is achieved, since no one else could have created the tag. However, since the key is not completely personal, a third party can never be convinced of who actually sent the message. Note that any MAC is vulnerable to the birthday attack, since the attacker may try to find any two messages that map to the same tag. This means that the size of the tag always needs to be at least twice the number of intended bits of security. We have already presented all necessary tools in order to construct MACs that are believed to be secure. Two such constructions are CBC-MAC and HMAC. 3.6.1 CBC-MAC In CBC-MAC, the idea is the same as when using the CBC mode of operation for block ciphers. We know that the last block of a message encryted with CBC mode depends on the contents of all previous blocks. However, some changes need to be made in order for the MAC scheme to be secure. First of all, the IV should be fixed instead of random. Remember that the randomization was necessary for semantic security, which is not what we are trying to achieve here. Moreover, if messages can have different lengths, then actions need to be taken in order to defend against something called extension attacks. These are attacks where a valid message-tag-pair, (m, t), is known, and the attacker tries to create a new, valid, pair (m0 , t 0 ), where m0 is simply m concatenated with some additional data. Exactly how such attacks can be carried out, and how to protect against them, can be found in Black and Rogaway [11]. 3.6.2 HMAC Remember that a hash function can be used to reduce the size of a large message down to something small, which we now know that a MAC normally does as well. However, a hash function does not depend on the secret key, which we know that a MAC must do. HMAC (Hash-based MAC) utilizes a hash function on a combination of the message and the secret key in order to construct a MAC. This combination has to be done with care in order for the MAC to be secure, and the full definition of HMAC specifies exactly how this combination should be performed. By following this specification, the security of the MAC depends on the strength of the hash function used and if, for example SHA-512 is being used, HMAC is considered to be secure. HMAC may actually be secure even if the used hash-function is not completely cryptographic. In particular, the collision resistance property may not be required. A detailed description of HMAC and the necessary security demands on the used hash function can be found in the original paper by Bellare, Ganetti, and Krawczyk [12]. 3.6.3 Authenticated Encryption We now know how to achieve confidentiality and integrity individually, but we have yet to discuss how to combine these constructions in order to achieve both 3.6 25 Message Authentication Codes simultaneously. A few approaches may work with some combinations of encryption and MAC scheme, but one approach is recommended and guaranteed to be secure for all combinations of secure ciphers and MACs, and that is to first encrypt the message under one key, and then MAC the ciphertext under a different, independent, key. A more detailed analysis of authenticated encryption is given in Bellare and Namprempre [13]. It is quite easy to realize that the recommended approach must be completely secure. Since the cipher is secure, the ciphertext reveals nothing about the message and so the tag cannot either since the keys are independent. Also, since the MAC is verified first, any messages with broken integrity is never decrypted, thus saving time. Whenever both confidentiality and integrity are intended, this is the solution that should be used, as illustrated in figure 3.6. Hi Bob! Alice kenc Encrypt A)=#FEJ “tag” Bob kmac MAC Figure 3.6: The correct order of operations when both confidentiality and integrity are intended. Wednesday, June 6, 12 4 Asymmetric Cryptography In this chapter, asymmetric cryptography is defined and compared to symmetric cryptography. Since the purpose of this thesis was to implement and evaluate an asymmetric construction, this chapter will be more thorough, and include more detailed descriptions of the constructions, than previous chapters. 4.1 The Key Distribution Problem We have up until now seen some very neat ways to achieve confidentiality and integrity by using a shared secret key. In fact, the already presented ideas are the most efficient and secure ones for providing data confidentiality and integrity, and are in wide use today. However, one big problem with these constructions is the assumption that there already exists a shared secret key. In almost all realistic settings, e.g. when communicating over the Internet, the two communicating parties does not have this shared secret to begin with, and they may even never have communicated before. Since the private keys need to be completely private, each communicating pair needs a different key and so for n entities, we need Θ(n2 ) (see Appendix A.1 on page 81 for notation) keys distributed in advance over some secure channel. In the Internet case, there is no such secure channel, and we need something else in order to solve this problem. 4.2 Public and Private Keys The main difference between symmetric and asymmetric cryptographic systems is that the latter uses different keys for encryption and decryption. The two keys must clearly have an intimate relation for this to work, but the schemes are con27 28 4 Asymmetric Cryptography structed such that given the encryption key, it is "hard" (see Appendix A.1 on page 81 for definition) to figure out what the decryption key is. This makes it possible to let the encryption key be completely public and known to everyone, while the private key is kept secret to the owner. For this reason, asymmetric cryptography is also known as public key cryptography. We can already see that this solves the key distribution problem mentioned above, since we no longer need a new key for each communicating pair, and since the public key can be distributed over an insecure channel (partly true, see section about public key infrastructure). Anyone who wants to send messages to Bob uses the same key, namely Bobs public key. Before showing how we can build public key cryptosystems, we will first look at another solution to the key distribution problem, which also was the starting point for public key cryptography. 4.3 Key Exchange A key exchange system is a way for two parties to generate a shared secret key over an insecure channel. The secret key can then be used with a symmetric cryptosystem in order to secure the communication, as described in the previous chapter. The fact that it is possible to utilize an insecure channel for this application is quite remarkable. An analogy could be that you can enter a room full of people that you have never met before. You then start shouting to someone on the other side of the room, such that everyone can hear what you are saying. After shouting to each other for a while, you can keep shouting and understanding each other completely, while no one else in the room can understand a thing of what you are saying. Intuition tells us this is impossible, and yet we can construct such schemes. See Appendix A on page 81 for the required mathematical background before continuing. 4.3.1 Diffie-Hellman-Merkle1 Key Exchange In 1976, Withfield Diffie and Martin Hellman published a paper, New Directions in Cryptography [15], that changed cryptography forever. Among many other things, they improved an idea proposed by Ralph Merkle a few years earlier, and constructed a scheme for key negotiation2 , as follows. 4.1 Definition (Diffie-Hellman-Merkle Key Exchange). Assume that Alice and Bob want to generate a shared secret key. First, they need to agree upon a cyclic group (see Appendix A.4 on page 86) (G, · ) of order q and a generator element, g. This data is not secret and can be sent completely in the clear. Then, the following steps are performed. 1 The scheme is usually referred to as just "Diffie-Hellman Key Exchange (DH)" since those are the authors of the paper. However, in 2002, Hellman proposed that also Merkle should be included if the scheme is given a name [14]. 2 It is usually called "exchange", even though that name implies that the secret is known to one of the parties before the protocol is run, which is not the case. Negotiation is a better name since the secret key is generated and negotiated while running the protocol. 4.3 29 Key Exchange 1. Both sides choose a random number each. xA , xB ∈ Zq . 2. The two sides calculates g xA ∈ G and g xB ∈ G respectively, and transmits the result to the other side. 3. Now, Alice can calculate (g xB )xA ∈ G and Bob can calculate (g xA )xB ∈ G. 4. By the rules of exponentiation, they both end up with the same secret value (g xB )xA = g xB xA = g xA xB = (g xA )xB , all ∈ G. See figure 4.1 for a visualization of the scheme. An adversary listening to this communication will gain information about what cyclic group and generator is being used, and will also see the values g xA and g xB . The problem of finding g xA xB from these values is called the Diffie-Hellman problem, and the assumption that it is hard is called the Computational Diffie-Hellman (CDH) assumption. It is easy to realize that solving the Diffie-Hellman problem is at least as easy as solving the discrete logarithm problem (DLP) in the group (see Appendix A.4 on page 86), because one solution is to find xA from g xA and then calculate (g xB )xA from this, i.e. just like Alice does. However, the DLP is considered "hard", meaning that no "efficient" adversary can solve it. It is however not known if the DiffieHellman problem is equivalent to that of finding discrete logarithms, or if there might be another way to find g xA xB from g, g xA , and g xB . Nevertheless, no attack better than solving the DLP exists today. Alice chooses group G and generator g x (random) x (random) A B G, g gx A Alice Bob gx B ?? gxx A B Eve gxx Figure 4.1: Diffie-Hellman-Merkle key exchange. Wednesday, June 6, 12 A B 30 4 Asymmetric Cryptography It is important to realize that this scheme, as described here, can only be secure against an eavesdropping adversary. This is because Alice has no way of knowing that she is actually talking to Bob, and the other way around. This enables a man in the middle attack, where an adversary, Eve, can intercept and replace all communication. When talking to Bob, Eve will pretend to be Alice and when talking to Alice, she will pretend to be Bob. Alice and Bob have no way of detecting this and will continue with their communication as normal, believing it is secure. In order to protect against this, additional measures will have to be taken. In particular, Alice and Bob need to know something about each other in advance in order to verify the authenticity of the messages and prevent this attack. The problem of verifying the identity of the other side will be discussed in the section about public key infrastructure. 4.4 Trapdoor Permutations We have already stated that asymmetric cryptography uses systems where the key for encryption and decryption is different, but we have yet to describe how this will work. Just like the pseudorandom permutation is the ideal block cipher, we will now look at something called a trapdoor permutation, which will be the ideal basis for an asymmetric cryptosystem. 4.2 Definition (Trapdoor Permutation). A trapdoor permutation is a set of three "efficient" algorithms, (G, F, F −1 ), referred to as key-pair generation, encryption, and decryption. G - Outputs a public-private key pair, called (kpub , kpriv ). F - F(kpub , x) evaluates the trapdoor permutation at point x. F−1 - F −1 (kpriv , y) inverts F, such that F −1 (kpriv , F(kpub , x)) = x We say that the trapdoor permutation is secure if it is "hard" to invert F without knowledge of kpriv . In words, a trapdoor permutation is a function that has an "efficient" algorithm for calculating it in the forward direction, but is "hard" to invert unless you have access to some extra information, the trapdoor. A common metaphor is that of a padlock, which anyone can lock. Opening it again is, however, hard unless you have access to the key or combination. Just like the case with pseudorandom permutations, no one knows if it is possible to construct trapdoor permutations. We do, however, have a few promising suggestions that will soon be discussed. 4.5 Semantic Security In order for our definition of semantic security, defined in the previous chapter, to be applicable to the asymmetric world, we need to make some slight adjustments to the first two steps. Instead of having the challenger choosing one random 4.6 ElGamal 31 key, it is instead going to run the key-pair generation algorithm, G, to acquire a public-private key pair, and then send the public key to the adversary. After this, the adversary can generate any number of ciphertexts himself instead of sending them to the challenger for encryption. With these adjustments, the remaining steps will be the same, i.e. the adversary will submit two messages and get back the encryption of one of them, and the goal is to determine which one was sent back. We say that a scheme is semantically secure if no "efficient" adversary can be significantly better at this game than the guessing adversary would be. 4.6 ElGamal In 1984, the Egyptian cryptographer Taher Elgamal described a way to leverage the discrete logarithm problem in cyclic groups (see Appendix A.4 on page 86) for creating a trapdoor permutation and thus providing confidentiality [16]. Before Alice can send any messages to Bob, he needs to generate a public and a private key. This is done by choosing a cyclic group of order q, a generator element g and also a random integer x ∈ Zq . The public key is then (G, q, g, h = g x ) and the private key is x. The hardness of the discrete logarithm problem makes it impossible for Eve to figure out x, assuming the group is large enough. ElGamal is now defined, using multiplicative group notation. 4.3 Definition (ElGamal Encryption). In order for Alice to send an encrypted message m to Bob, with public key (G, q, g, h = g x ) and private key x, the following steps are taken. ElGamal Encryption • Alice chooses a random y in [1, q − 1], and calculates c1 = g y . • Alice calculates s = hy = (g x )y . • Alice calculates c2 = m0 · s, where m0 is the message to send, represented as an element in the chosen group. • Alice sends (c1 , c2 ) = (g y , m0 · hy ) = (g y , m0 · (g x )y ) to Bob. ElGamal Decryption • Bob calculates (c1 )x = (g y )x = s. • Bob calculates c2 · s−1 = m0 · (g x )y · ((g x )y )−1 = m0 ⇒ m. We can see that the above system is in fact a cipher, since the original message is retrieved upon decryption. The described version of ElGamal has some problems that must be fixed by applying some padding scheme to the message prior to encryption. If done properly, e.g. as described by Cramer and Shoup in [17], the security of the scheme depends on the hardness of the discrete logarithm problem in the chosen group. 32 4 Asymmetric Cryptography Note that the discrete logarithm problem in a cyclic group is not a trapdoor permutation in itself, since no secret information can facilitate the calculation of the exponent. The ElGamal system rather leverages the discrete logarithm problem to create a trapdoor, in the described way. 4.7 RSA RSA is probably the most well-known cryptographic scheme that exists today. It was first described in 1978 by Ron Rivest, Adi Shamir, and Leonard Adleman [18]. The RSA system uses the integer factorization problem as the trapdoor basis, i.e. it uses the fact that multiplication of numbers can be done "efficiently", whereas factorization of the product is considered "hard". To build a trapdoor out of this, we use some of the mathematics explained in Appendix A on page 81. 4.4 Definition (RSA). RSA Key Pair Generation The following steps are taken to generate the RSA private and public keys. • First, choose3 two large primes p and q. • Calculate n = pq and φ = (p − 1)(q − 1). As stated in Appendix A.4 on page 86, we know that φ is the order of the group (Z∗n , · ). • Choose an integer 1 < e < φ such that gcd(e, φ) = 1 (i.e. e ∈ Z∗n ). • Calculate d = e−1 mod φ. This means that ed = kφ + 1 for some integer k, a fact that is used in decryption. • The private key is (n, d) and the public key is (n, e). Notice that since factorization is considered "hard", an adversary cannot find φ and calculate d from e. RSA Encryption In order to encrypt a message M using RSA, the following steps are taken. • The message M is converted to an integer, 0 < m < n. • The ciphertext is calculated by c = me mod n. RSA Decryption Upon reception of a ciphertext c, the following calculation is done. • c d ≡ (me )d ≡ m(ed) ≡ m(kφ+1) ≡ m(kφ) m ≡ m mod n (in the last step, Euler’s theorem and the Chinese Remainder Theorem are used)4 . 3 They have to be chosen with some care, in order to not enable some specific attacks. See Boneh [19] for more information. 4 Note that m need not be a member of Z∗ , however, the Chinese Remainder Theorem guarantees n correctness also in this case, see the Appendix A.3.1 on page 83. 4.8 Hybrid Systems 33 For decryption, the Chinese Remainder Theorem may be used to speed up the computations, as presented in Appendix A.3.1 on page 83. If this is the case, the private key need to be extended to keep the information of which primes were used to generate the modulus. 4.7.1 RSA Encryption Standards The above scheme for encryption is sometimes referred to as "textbook RSA", since it is the way it is usually first described in literature. However, this pure approach has many problems. One of them, that eliminates all chances of being semantically secure, is that it is deterministic. Every time the same message is encrypted, the ciphertext will also be the same. This problem, among with others, can be solved by making a few additions to the RSA operations. The usage of RSA has been standardized by RSA Security5 in PKCS#1 [20], and the current recommendation is to use something called Optimal Asymmetric Encryption Padding (OAEP). This basically describes how the randomness is introduced and how the messages should be padded in order to make the encryption scheme secure. Another problem is that only short messages can be encrypted, since they have to be converted to an integer < n. One solution to this would of course be to utilize RSA as a block cipher and use some mode of operation. However, the simplest and best solution is to use something called a hybrid system. 4.8 Hybrid Systems We now know that we can use some public key cryptographic schemes to encrypt messages and achieve confidentiality. However, we also know that the underlying operations are much more computationally demanding, i.e. slower, than the symmetric systems we have seen before. What is usually done to mitigate this is that a symmetric key is chosen at random and used to encrypt the long message, with for example AES or a secure stream cipher, and then this short key is encrypted using public key cryptography, e.g. RSA, and sent along the encrypted message. This means that we can use the best of the two worlds, efficiency from the symmetric world and key distribution from the asymmetric world. These solutions are in general referred to as hybrid systems. The concept is visualized in figure 4.2 on the following page, however notice that the example ciphers can be replaced by any secure cipher of the same class. The only missing piece in this description is the distribution and validation of the public key, which will be discussed in the section on public key infrastructure. 4.9 Security of Public Key Algorithms Remember that a large key space is a necessary but not sufficient requirement in order for a cryptographic scheme to be secure. The existence of special at5 A company founded by the inventors of RSA. 34 4 k k pub Asymmetric Cryptography m sym RSA AES c = (c , c ) 1 2 Figure 4.2: Hybrid encryption for the special case of RSA as asymmetric construction and AES as the symmetric. tacks, faster than brute force, means that the key sizes must be larger than the corresponding bits of security. For asymmetric systems, many such attacks exists, and depending on how serious they are, different systems require different key lengths, in order to achieve the same level of security. This is clearly different from the symmetrical world, where brute force in general is the best possible Wednesday, June 6, 12 some exceptions. This section describes the best attacks against attack, which the RSA construction, and against constructions based on the discrete logarithm problem. 4.9.1 RSA Security By looking at the definition of RSA, one may realize that the security of it is based on the hardness of the following three underlying mathematical problems. (1) Integer factorization If it would be possible to find the prime factors for integers, the modulus n = pq could be factored and p and q found. The order, φ(n) = (p − 1)(q − 1), would then be computed and knowing it enables efficient calculation of the secret key d from the public key e by using, for instance, the Extended Euclidian algorithm, as mentioned in Appendix A.3.3 on page 85. (2) Finding e’th roots modulo composites We know that a message encrypted by textbook RSA is c = me mod n, and that e is publically known to everyone. This means that if we can find e’th roots mod n, then we would be able to extract the message from any ciphertext. (3) Solving the discrete logarithm problem modulo n If the attacker chooses a message to encrypt, then he will know m and the 4.9 35 Security of Public Key Algorithms corresponding c. The relation between these is such that m = c d mod n, so if the attacker could solve the discrete logarithm problem for composite n then he would instantly get the secret key d. These three problems are, however, not unrelated. First of all, Boneh [19] gives a proof that solving (3) also will solve (1), i.e. there exists a polynomial time reduction from the composite DLP to RSA modulo factorization. More specifically, he proves that given the private key, d, the modulus can be efficiently factored. Also, it is easy to realize that given a solution to (1), it is easy to break (2), since we then have access to the trapdoor. This means that the hardness of (2) is the strongest (and therefore also the "only") requirement, and this is usually referred to as the RSA Problem. The assumption that this is hard is called the RSA Assumption. Many textbooks state that RSA is secure as long as integer factorization is hard, but it is still unknown if this statement is really true, i.e. if the RSA problem is equivalent to integer factorization, or if it is possible to attack (2) in some other way. However, the currently fastest attack on the RSA system is indeed by factoring the modulus, assuming the parameters are chosen wisely and the implementation is done well. In the following sections the factorization attack, as well as some special attacks due to sub-optimal parameter selection or implementation are described. Attack by Factorization The difficulty of factoring a composite number into prime factors depend on both the size of the number and on the structure. The RSA system uses composite numbers referred to as semiprimes, which are simply the product of two primes. These are the hardest composite numbers to factor. If the composite is large enough, which the RSA modulus is chosen to be, the best known algorithm for factorization is called the General Number Field Sieve (GNFS). The running time of this algorithm is sub-exponential (see Appendix A.1 on page 81), and more specifically according to the following expression [21]. q e 3 64 9 +o(1) 1 2 (log n) 3 (log log n) 3 We will soon see how this attack impacts the necessary key length in order to make RSA secure, but first we will briefly look at what happens when not enough care is taken in the choice of parameters or in the implementation. Special Attacks As already stated, the modulus can be factored given the private key. This means that the primes p and q have to be randomly chosen for each key-pair generation, i.e. the modulus may never be reused. Moreover, p and q have to be chosen such that p − 1 and q − 1 have no small prime factors because otherwise factoring can be done efficiently [19]. Since one of the two exponents (private and public) can be chosen and the other 36 4 Asymmetric Cryptography follows, it is possible to choose a small public exponent, e, in order to speed up encryption. The smallest possible value for e is 3, but choosing this opens up for some special attacks [19]. In order to avoid this, the recommended value for e is 216 + 1 = 65537. In some cases, a small private exponent is more desirable than a small public exponent. However, letting d be small opens up for serious attacks that totally √ breaks RSA. It is recommended that d > n ([19]) and by following the previous recommendation (and implementation standard) that e = 65537, d is well above this limit for large enough n. Implementation Attacks Even if the parameters are chosen such that no special attack can be carried out, the resulting construction is still not necessarily as secure as factorization. The actual implementation may leak information about the ongoing process, that can be used to break the encryption. When the device performing the cryptographic operations is in the hands of an observer, measures have to be taken in order to disallow tampering or external memory access, a protection called tamper resistance. However, some attacks, generally referred to as side-channel attacks, does not even require physical access. The most famous example of this is the timing attack, mentioned in the introduction chapter. By measuring the exact time of the cryptographic operations for different input, information about the secret key can be deduced. This attack is not only possible for RSA but rather concerns all cryptographic implementations, including symmetrical ones. However, some systems may be more susceptible than others [22]. Required RSA Key Length For most symmetric constructions providing confidentiality, such as AES, the bits of security is usually very close to the actual size of the key. This is not the case for RSA as well as for many other public key algorithms. The exhaustive search attack is an exponential time algorithm, since each added bit to the key size doubles the numbers of keys that have to be tried. For RSA, the best attack is as mentioned sub-exponential, and because of this, the key size needs to be longer than the intended bits of security. Moreover, the difference between the key length and the intended bits of security increases with increased security. Table 4.1 on the next page shows the relationship between the bits of security and the required RSA key size. The largest RSA key that has been broken is 768 bits ([21]), but also 1024 bits may be broken within a few years and should not be used in any new systems. The NIST recommendation from 2011 until 2030 is to provide at least 112 bits of security, corresponding to 2048 bits RSA [3]. The long keys both affect the required amount of memory as well as the efficiency. Eventually, RSA will be impractical to use because the necessary key size in order to avoid attacks will make the implementation too slow. This will be especially noticeable in environments with limited storage and computational capabilities, 4.9 37 Security of Public Key Algorithms Bits of security RSA key size 80 112 128 192 256 1024 2048 3072 7680 15360 Table 4.1: RSA required key size. like embedded systems such as smart cards. This is the main reason for the replacement of RSA by newer cryptographic constructions, in particular those based on elliptic curves, described later. 4.9.2 Solving the Discrete Logarithm Problem The public key cryptographic constructions in use that are not based on the RSA assumption are instead based on the discrete logarithm problem in some group. Different systems use different groups with different group operations and some of these may be suceptible to special attacks. Before discussing these, the existing attacks on the generic discrete logarithm problem are discussed. As described in Appendix A.4.2 on page 87, the naive approach is to just perform the group operation over and over again until the value is found. This is clearly linear to the size of the group, and thus exponential to the number of bits. There are some better methods, as described here, but note that none of these run in polynomial time, or even in sub-exponential time, in the general case. The Generic DLP The naive approach to solve the discrete logarithm problem, as described in Appendix A.4.2 on page 87, would be to simply repeat the group operation over and over again until the right value is found. This would on average require n/2, i.e. Θ(n), group operations, where n is the order of the base element, which equals the size of the group if a generator element is chosen. Since n grows exponentially as the number of bits increases, this is an exponential-time algorithm. This method may however still be useful in combination with other approaches. First of all, if the factorization of the group order is known, the Chinese Remainder Theorem (Appendix A.3.1 on page 83) may be utilized, such that work can be performed on the prime factors separately and then combined. This means that if the order only has small prime factors, the DLP can be solved, even if the full group is very large. This idea was first presented by Pohlig and Hellman in [23]. In order to avoid this attack, only groups with orders of only large prime factors, or even prime order6 , are used. 6 Prime order is impossible for (Z , · ) if p is a prime, since the order is p − 1, and clearly not prime. p However, p − 1 may still only have two prime factors, 2 and some large prime. 38 4 Asymmetric Cryptography Famous methods, better than the naive approach, for solving the DLP in a generic group are the Baby-Step/Giant-Step described by Shanks in [24], Pollard’s Rho method [25], √and the Lambda method [25]. These all have the asymptotic running time Θ( n) = Θ(2|n|/2 ) where |n| denotes the number of bits of n. Shanks’ method is rarely used in practice, since it requires a large amount of memory, however the Pollard Rho method runs as fast with negligible storage. The Lambda method is only useful when the result is known to lie within some small range. A more detailed overview of these method is presented by Studholme in [26]. All of these methods are clearly faster than the naive approach, however they still require an exponential amount of group operations. In [27], Shoup proves √ that Θ( n) is in fact the asymptotical lower bound on the running time in the generic case where no special group structure can be leveraged. That is to say, no polynomial time, or even sub-exponential time, algorithm will ever be found for solving the generic discrete logarithm problem. DLP Over (Zp∗ , · ) When applying the DLP for cryptographic usage, e.g. in ElGamal and DiffieHellman-Merkle, we need to use a specific group rather than a generic. The most well-used such group is (Zp∗ , · ), as defined in Appendix A.4 on page 86. Remember that this can simply be represented by the integers smaller than p and that the operations are normal modular arithmetic. Using this group, however, enables attacks that utilize the specific structure and run faster than the generic lower-bound running time. Examples of such algorithms are the Index Calculus method and the Number Field Sieve. The details of these methods are complex and are not described here, but can be found in [28]. The important result is that these algorithms run in sub-exponential time, and more specifically in the same amount of time as the best integer factorization algorithm does, as described in 4.9.1 on page 35. This also means that the necessary key sizes will be similar to that of RSA, presented in table 4.1 on the previous page. Because of the existence of sub-exponential attacks, we would like to replace the use of (Zp∗ , · ) with some other group, where no special attacks exist and only a generic, exponential-time, attack may be applied. 4.9.3 Shor’s Algorithm Before moving on and describing a group where we no specific structure is known that enables sub-exponential time attacks, we have to mention that there in fact exists a polynomial-time algorithm, called Shor’s algorithm, that both factor integers and solves the generic discrete logarithm problem, however on a computational device that doesn’t exist in large scale, namely a quantum computer. The future existence of such a device, described e.g. by Gregg in [28], would break all the cryptographic schemes that are based on these problems. 4.10 39 Elliptic Curve Cryptography 4.10 Elliptic Curve Cryptography We have seen that constructions based on both the RSA problem or on the discrete logarithm problem over (Zp∗ , · ) enables sub-exponential time attacks, thus requiring long keys which is a problem both for management and for performance reasons, especially in embedded systems. We will now see how we can use something called elliptic curves to find other cyclic groups, where the body, i.e. the set of elements, is points on a curve, i.e. solutions to an equation, instead of integers. Elliptic curves have been known for a long time, but the idea to use them within cryptography was first stated independently in 1985 by Koblitz in [29] and by Miller in [30]. For these groups, no sub-exponential algorithms are known to date and the generic methods are the fastest. Before describing how we can construct these cyclic groups, table 4.2 shows how this impacts the necessary key sizes. Remember that the key size for systems based on the group of modular multiplication is basically the same as for RSA. Bits of security RSA key size ECC key size 80 112 128 192 256 1024 2048 3072 7680 15360 160 224 256 384 512 Table 4.2: RSA and ECC required key sizes. Note that the keys are not only much shorter than for RSA, but that they also grow linearly with increased security, as opposed to exponentially for RSA. In particular, the key sizes in the elliptic curve case need only be twice the number of bits of security. 4.10.1 Elliptic Curves An elliptic curve7 , defined over the rational numbers, can be described as the set of points, (x,y), that satisfies the simplified Weierstass equation y 2 = x3 + ax + b where a, b ∈ Q, along with a special point, O, called the point at infinity. We only consider curves without cusps, self-intersections or isolated points, which algebraically means that 4a3 + 27b2 , 0. Figure 4.3 on the next page shows two examples of such curves. The point at infinity can be found in either "end" of the y-axis. We will now define a binary operation for points on the curve that will, together 7 The name is somewhat misleading, since the curves are not at all ellipses. It is derived from the fact that the same equation comes up when calculating arc length of ellipses. 40 4 y Asymmetric Cryptography y x E : y 2 = x3 − x x E : y 2 = x3 − x + 1 Figure 4.3: Two examples of points satisfying the above equations over the real numbers. with the set of points, form a cyclic group. Both the algebraic definition of this operation as well as a graphical interpretation will be given. In the special case when a point is added to itself, we call the operation point doubling and both the geometrical interpretation as well as the algebraic formula will differ slightly. The point at infinity will play the role of the identity element. This means that the following equations hold for a point P on the curve. P +O =O+P =P (4.1) P + -P = O (4.2) where negation is defined as -P = -(x, y) = (x, -y). Point Addition In order to add two points, P and Q, on a curve, a straight line between them is formed. This line will always either intersect the curve at a third point, or at the point at infinity if P and Q have the same x-coordinate. In the first case, the intersection point is the negative of the result, −R, and all we need to do to get the result, R, is to negate the y-coordinate. In the second case, the result is simply the point at infinity. The addition of two distinct points is illustrated in figure 4.4 on the next page. Algebraically, to add two points, P = (x1 , y1 ) and Q = (x2 , y2 ) (P , Q), the resulting point, (x3 , y3 ), is calculated by the following operations. !2 ! y2 − y1 y2 − y1 x3 = − x1 − x2 and y3 = (x1 − x3 ) − y1 x2 − x1 x2 − x1 4.10 41 Elliptic Curve Cryptography y y −R P Q P x x Q R E : y 2 = x3 − x E : y 2 = x3 − x Figure 4.4: Examples of point addition, R = P + Q. Notice that the result in the right figure is the point at infinity. Point Doubling In the case where the two points to add are actually the same, P , we instead use the tangent line to find the negative of the result point, −R. If the tangent line is vertical, the result is the point at infinity. Figure 4.5 on the next page illustrates this graphical interpretation. Algebraically, to double a point, P = (x1 , y1 ), i.e. add it to itself, the resulting point, (x3 , y3 ), is calculated by the following operations. !2 ! 3x12 + a 3x12 + a x3 = − 2x1 and y3 = (x1 − x3 ) − y1 2y1 2y1 Multiplication by Integer By repeating the above operations, it is possible to define the multiplication between a point and an integer. nP is simply P added to itself n times. Note that this corresponds to exponentiation in the group we have previously been working with, where the body is all positive integers smaller than some prime and the group operation is modular multiplication. The only difference is that we now instead call the group operation "addition" and that we use such notation instead of the multiplicative. See the note in Appendix A.4.2 on page 87 about why the multiplication (there called exponentiation) can be carried out much more efficiently than by naive enumeration of all points. In the section on implementation, different algorithm for performing this efficiently and securely are discussed. 42 4 Asymmetric Cryptography y y −R P P x x R E : y 2 = x3 − x E : y 2 = x3 − x Figure 4.5: Examples of point doubling, R = 2P . Notice that the result in the right figure is the point at infinity. 4.10.2 Elliptic Curves Over Finite Fields For cryptography usage, we are more interested in curves that are defined only over the integers, since computers are inexact when working with non-integers. In particular, we will look at the case where x and y are elements in a finite field. There are two types of finite fields that have been standardized as the basis for elliptic curve groups, namely fields of prime order, GF(p), and fields of powerof-two order, GF(2m ), as described in Appendix A.4.3 on page 87. Both of these cases will generate cyclic groups that can be used for cryptographic purposes, but the underlying calculations will be very different. Curves Over Prime Order Finite Fields The simplified version of the Weierstass equation in this case is the same as in the rational case, except that a and b are integers, and that all operations are performed in GF(p), i.e. modolu some prime number. y 2 = x3 + ax + b mod p where a, b ∈ Z and p is a prime. Also the group operation will have the exact same formula as already described, except that all calculations are done in the field, i.e. all additions, multiplications, and divisions are modular. The section on implementation talks about how these operations have been carried out efficiently. 4.10 43 Elliptic Curve Cryptography Curves Over Power of Two Order Finite Fields For the curves defined over a finite field of order 2m , m ∈ Z + , the simplified version of the Weierstass equation looks slightly different. y 2 + xy = x3 + ax2 + b where a, b ∈ Z Also the algebraic formula will be different. In order to add two distinct points, P = (x1 , y1 ) and Q = (x2 , y2 ), the resulting point, (x3 , y3 ), is calculated in the following way. x3 = λ2 + λ + x1 + x2 + a and y3 = λ(x1 + x3 ) + x3 + y1 where λ= y1 + y2 x1 + x2 In order to double a point, i.e. add it to itself, P = (x1 , y1 ), the resulting point, (x3 , y3 ), is calculated in the following way. x3 = λ2 + λ + a = x12 + b and y3 = x12 + λx3 + x3 x12 where λ = x1 + y1 x1 All arithmetic operations are performed in the finite field. Exactly how these are carried out depends on the choice of element representation. The section about implementation covers some algorithms for performing these operations efficiently. 4.10.3 Projective Coordinate Representations In many cases when implementing the group operation, the divisions, i.e. inversions, will be much more computationally demanding than all other operations. By using an alternative coordinate representation, we can transform the formulas and reduce the number of inversions that are needed. Inversions will however still be necessary when transforming a point to and from the new representation. This means that for one group operation, no improvement will be achieved. However, if many group operations will be performed, we can stay in the alternative representation for all of these, and only perform one transformation. We will soon see, when describing cryptographic protocols based on elliptic curves, that this is often the case and that huge performance improvements can be achieved by using an alternative representation. We will now describe two projective coordinate representations, one for curves based on prime order finite fields, and one for curves based on power-of-two order finite fields. We will refer to the normal representation as affine coordinates. 44 4 Asymmetric Cryptography Jacobian Projective (JP) Coordinates The Jacobian Projective coordinates are suitable for curves based on prime order finite fields. Here, the projective point (X, Y , Z) with Z , 0 represents the affine point ( ZX2 , ZY3 ). The elliptic curve equation now instead becomes Y 2 = X 3 + aXZ 4 + bX 6 and the point at infinity, O, is represented by the point (1, 1, 0). The formula for addition is actually the most efficient when one of the two points is in Jacobian projective coordinates and the other in normal affine coordinates. The affine point is then converted to projective coordinates on the fly, by simply choosing8 its Z-coordinate to be 1. These new coordinates will then, for all projective representations, simply be (x, y, 1), and the expressions in the generic formula involving Z2 can be simplified. We will also later see that this approach is suitable for cryptographic operations. In order to add two distinct points, P = (X1 , Y1 , Z1 )JP and Q = (x, y)affine = (x, y, 1)JP = (X2 , Y2 , Z2 )JP , the resulting point (X3 , Y3 , Z3 )JP is calculated in the following way. X3 = (Y2 Z13 − Y1 )2 − (X2 Z12 − X1 )2 (X1 + X2 Z12 ) Y = (Y2 Z13 − Y1 )(X1 (X2 Z12 − X1 )2 − X3 ) − Y1 (X2 Z12 − X1 )3 3 Z3 = (X2 Z 2 − X1 )Z1 1 In order to double a point, i.e. add it to itself, P = (X1 , Y1 , Z1 ), the resulting point, (X3 , Y3 , Z3 ), is calculated in the following way. X3 = (3X12 + aZ14 )2 − 8X1 Y12 Y3 = (3X12 + aZ14 )(4X1 Y12 − X3 ) − 8Y14 Z3 = 2Y1 Z1 Notice that no inversions are needed in any of the two formulas. Only when moving from Jacobian Projective coordinates back to affine, inversions will be needed. López-Dahab (LD) Projective Coordinates The LD Projective coordinates are suitable for curves based on power-of-two order finite fields. Here, the projective point (X, Y , Z) with Z , 0 represents the Y affine point ( X Z , Z 2 ). The elliptic curve equation now instead becomes Y 2 + XY Z = X 3 Z + aX 2 Z 2 + bZ 4 and the point at infinity, O, is represented by the point (1, 0, 0). As in the case for Jacobian Projective coordinates, the addition formula is most efficient when one of the points come in projective coordinates and the other in affine coordinates. 8 Z can be arbitrarily chosen (Z , 0), and the values of X and Y are then determined. To get the benefits of mixed coordinates however, Z is chosen to be 1. 4.10 Elliptic Curve Cryptography 45 In order to add two distinct points, P = (X1 , Y1 , Z1 )LD and Q = (x, y)affine = (x, y, 1)LD = (X2 , Y2 , Z2 )LD , the resulting point, (X3 , Y3 , Z3 )LD , is calculated in the following way. X3 = (Y2 Z13 − Y1 )2 − (X2 Z12 − X1 )2 (X1 + X2 Z12 ) Y3 = (Y2 Z13 − Y1 )(X1 (X2 Z12 − X1 )2 − X3 ) − Y1 (X2 Z12 − X1 )3 Z3 = (X2 Z12 − X1 )Z1 In order to double a point, i.e. add it to itself, P = (X1 , Y1 , Z1 ), the resulting point, (X3 , Y3 , Z3 ), is calculated in the following way. X3 = X14 + bZ14 Y = bZ14 Z3 + X3 (aZ3 + Y12 + bZ14 ) 3 Z3 = X 2 Z 2 1 1 Notice that also in this case, no inversions are needed. Only when moving from LD projective coordinates back to affine, inversions will be needed. 4.10.4 The Elliptic Curve Discrete Logarithm Problem (ECDLP) In Appendix A.4.2 on page 87, we defined the discrete logarithm problem for a generic group, with multiplicative notation. The corresponding problem for additive notation, and more specifically for elliptic curve groups, is the following. 4.5 Definition (ECDLP). Let E(Fq ) denote the set of points on a curve over the finite field Fq . The discrete logarithm problem over elliptic curve groups is to, given a point P ∈ E(Fq ) of order n and a point Q ∈ hP i, find the integer l ∈ Zn such that Q = lP . The naive approach to solve this is to simply enumerate P , 2P , 3P , ... until Q is found. This will on average require n2 additions and so this takes time exponential to the number of bits of n. By just increasing the size of n by one bit, the necessary work is doubled. We know that for the groups we have previously worked with, attacks exist that run in sub-exponential time. This was the case because some structure in the groups was leveraged. For elliptic curve groups, no such structure is known, and hence only the generic algorithms, described in section 4.9.2 on page 37, √ with running time Θ( n) apply. This means that the key need only be twice the intended bits of security, growing linearly with increased security. 4.10.5 Group Order Let E(Fq ) denote the set of points on a curve, and #E(Fq ) the number of elements in this set, i.e. the order of the group. We know that this number must be large enough in order for the group to be suitable for cryptographic schemes, since otherwise simple brute force can solve the discrete logarithm problem. We can 46 4 Asymmetric Cryptography be assured of this by applying Hasse’s theorem [31], which gives a tight bound for the order. 4.6 Definition (Hasse’s theorem). Let E be an elliptic curve defined over a finite field of order q, then √ √ q + 1 − 2 q ≤ #E(Fq ) ≤ q + 1 + 2 q. √ For large q, q will be much smaller than q and so the order of the curve group is almost the same as the order of the field. Choosing a large enough field therefore yields a large enough elliptic curve group. For cryptographic purposes however, we would like to know the exact order of a curve, in order to avoid some specific attacks. In particular, we want the curve order to be prime or almost prime, that is a prime multiplied by some small number. There exists a few algorithms for counting the points on a given curve, and the currently most efficient one is called the Schoof-Elkies-Atkin (SEA) algorithm, as described by Schoof in [32]. In some cases, however, it may be faster to first pick a suitable order, and then generate a curve of that order. This is how the Complex Multiplication (CM) method works, as mentioned in Hankerson et al. [33]. 4.10.6 Domain Parameters The domain parameters is the data that the two sides need to have in order to communicate using elliptic curve cryptographic schemes. Note that the parameters are public and may be transmitted unencrypted, however still authenticated. 4.7 Definition (EC Domain Parameters). The domain parameters for an elliptic curve scheme are D = (q, FR, S, a, b, P , n, h) where 1. q is the order of the used field Fq . 2. FR indicates which representation has been chosen for the elements in the field. 3. S is the seed in the case where the curve was generated randomly9 . 4. a and b are the curve equation coefficients. 5. P is the base point. 6. n is the order of P . 7. h is the cofactor, i.e. h = #E(Fq )/n. If h = 1, then P is a generator point. 9 The process of generating random, and secure, curves are not given here, but can be found in Hankerson et al. [33] 4.10 Elliptic Curve Cryptography 47 Some care has to be taken when choosing these in order to avoid some special attacks, just as the case with RSA. However, unlike RSA, the security does not depend on the fact that each entity chooses individual, secret, parameters. Only the elliptic curve key pair, to be defined soon, need to be kept secret. For this reason, NIST has selected a number of recommended curves to use, that avoids all known special attacks. For instance, they all have a base point with prime order, and the cofactor is 1. These curves are also chosen to facilitate efficient implementations. NIST has selected 15 curves suitable for cryptographic operations [34]. Five of these are defined over a prime order finite field, and have been generated randomly. These have sizes of 192, 224, 256, 384, and 521 [sic! ] bits. The rest are defined over binary fields. Five of these are also generated at random, while the other five are so called Koblitz curves, which enables faster implementation. These both have lengths 163, 233, 283, 409, and 571 bits. 4.10.7 Elliptic Curve Key Pair Once we have the somewhat complicated underlying mathematics settled, the actual operations to utilize these for cryptographic purposes will be quite simple. Since the basis is the discrete logarithm problem, where given the result of a multiplication it is hard to find the multiplicand, the public key will simply be this result, and the private key will be the integer that was used to create it. Note that a public-private key pair is valid only for one specific set of domain parameters. 4.8 Definition (EC Key Pair Generation). Given domain parameters D = (q, FR, S, a, b, P , n, h), a key pair K = (kpub , kpriv ) is generated by the following steps. 1. Generate a random number d ∈ [1, n − 1]. 2. Compute Q = dP . 3. Return K = (Q, d). 4.10.8 Encryption Using Elliptic Curves (ECIES) There are numerous ways we can provide confidentiality of data by using elliptic curves as the basis. On of them would be to utilize the ElGamal scheme, described in section 4.6 on page 31. Since it is defined over a generic cyclic group, we can, for instance, use an elliptic curve group for this. However, the standard when performing encryption by using an elliptic curve group is instead a different scheme, which uses the convenient hybrid approach. This is called the Elliptic Curve Integrated Encryption Scheme (ECIES), and is defined by American National Standards Institute (ANSI) in ANSI X9.63 [35]. The scheme uses something called a Key Derivation Function (KDF) which is a function that generates multiple keys from one master key. In order for it to be 48 4 Asymmetric Cryptography considered secure, the resulting keys must be independent as far as an "efficient" adversary is concerned. 4.9 Definition (ECIES). We define the steps of the Elliptic Curve Integrated Encryption Scheme hybrid system as follows. ECIES Setup • Select a symmetric cipher, (Esym , Dsym ), a message authentication code, (MACS , MACV ), and a key derivation function, K DF. • Choose the elliptic curve domain parameters, D = (q, FR, S, a, b, P , n, h). • Acquire the public key of the recipient Bob, KB = dB G where dB is Bob’s private key. ECIES Encryption In order for Alice to send a message, m, to Bob, the following steps are performed. • Alice generates a random number r in [0, n − 1] and calculates R = rG. • Alice calculates rKB = (Px , Py ), and lets S = Px . We call S the shared secret. • Alice uses the KDF to derive a symmetric encryption key and one authentication key, (kenc , kmac ) = K DF(S). • Alice encrypts the message, c = Esym (kenc , m). • Alice computes the tag t = MACS (kmac , c). • Alice outputs R||c||t. ECIES Decryption In order for Bob to decrypt the received ciphertext, R||c||t, the following is performed. • Bob calculates dB R = (Px , Py ), and lets S = Px , i.e. S is the same shared secret as Alice used. • Bob uses the KDF to derive a symmetric encryption key and one authentication key, (kenc , kmac ) = K DF(S). • Bob verifies the message integrity by performing MACV (kmac , c, t), and outputs "failed" if the verification function returns "no". • Bob decrypts the message, m = Dsym (kenc , c). The standard specifies which symmetric constructions may be used and how the KDF can be implemented. In the implementation section, one example selection for these is given. 4.11 Digital Signatures 49 Notice that the method for generating the shared key is very similar to the DiffieHellman-Merkle key exchange, as described in 4.3.1 on page 28. Actually, the method described here is sometimes referred to as Elliptic Curve Diffie Hellman (ECDH). The scheme is secure since finding dB rG from dB G and rG is considered hard (the Computational Diffie-Hellman assumption), of course assuming that the used symmetric schemes are secure. 4.11 Digital Signatures We have now discussed how to utilize public key cryptosystems, that solved the key distribution problem, in order to achieve confidentiality. We also realized that we can use this in a hybrid system, and thereby utilize everything we have learned about symmetric cryptography. This means that we can also achieve integrity and authenticity in the same way as before. We will now see how we instead can achieve these goals, and more, by using these asymmetric constructions directly. The asymmetric counterpart to the symmetric MAC is the digital signature. The name have been chosen because it provides functions similar to that of a physical signature, with pen and paper. Remember that the physical signature is simply a specific pattern of symbols that the signer applies to a document that is being signed, and that the same pattern is used for all documents. Such signatures are considered secure because it is generally hard to forge them, and that attempting to do so is a serious crime. In the digital world, copying exact patterns are extremely easy, and being anonymous while doing so is almost equally easy. For these reasons, a digital signature cannot only depend on the identity of the signer, but must also depend on the content of what is being signed, such that a valid signature for one document is invalid for another, just as the case with a secure MAC. 4.10 Definition (Digital Signatures). A Digital Signature, defined over (K, M, T ) is a triple of algorithms (G, S, V ) where G : ∅ → K, S : K × M → T and V : K × M × T → {"yes", "no"} and where ∀m ∈ M, ∀k ∈ K : V (k, m, S(k, m)) = "yes". In other words, a digital signature for a message is a short tag that is intended to prove who sent the message and that it has not been altered during transmission, i.e. it provides authenticity and integrity. The definition of a secure digital signature is equivalent to that of a secure MAC, i.e. a digital signature needs to be existentially unforgeable. Since digital signatures are the main focus of this thesis, this important definition is restated here. 4.11 Definition (Existentially Unforgeable Digital Signatures). The security of a digital signature scheme is defined by a game between a challenger and an adversary, through these steps. 1. The challenger generates a private-public key pair. 50 4 Asymmetric Cryptography 2. The adversary gets to submit any number of messages to the challenger, and for each the challenger sends back a valid signature for that message. 3. The adversary sends a message - signature pair, (m, t), not equal to any of the previously created pairs. 4. The challenger runs the verification algorithm on (m, t). The used digital signature scheme is said to be secure, or existentially unforgeable, if for all "efficient" adversaries the output of the verification is "yes" with negligible probability. We can see that digital signatures have much in common with MACs, but that there are a few important differences that make digital signatures more suitable to be the digital counterpart of physical signatures. Remember that MACs only provide authenticity and non-repudiation down to the pair of users that has access to the secret key, but not to a single entity. Digital signatures uses a privatepublic key pair that is unique for each signer, and as long as these are handled correctly, true authenticity and non-repudiation can be achieved, and this can even be used in court. Note that unlike encryption constructions, signature generation need not be randomized in order to be secure. This is also true for MACs, indeed both provided examples of secure MACs in section 3.6 on page 23 are completely deterministic. However, some newer constructions for digital signatures are randomized, and the general belief is that this may provide more security against future attacks. The drawback of using a randomized method is that the verification algorithm is likely to be more complicated, since it cannot simply recalculate the signature and compare the result with the provided one. Also, a random signature may need to be longer than a deterministic, since the randomness must be included somehow. 4.11.1 RSA Signatures Our first example of a digital signature scheme will be based on the RSA algorithm, i.e. on the integer factorization problem. The idea is very similar to that of RSA encryption, namely to use two elements, e and d, in the group (Zn∗ , · ) such that ed = 1 ∈ Zn∗ . In the encryption case, the public key was used to hide the contents of a message, that only the private key could reveal. If we instead do the complete opposite, i.e. use a private key to "encrypt" a message, then the public key can be used to "decrypt" it again10 . By sending both this "encrypted", i.e. signed, version of a message, as well as the message in clear, anyone with the public key can "decrypt", i.e. verify, and compare the two. Since it is "hard" to find the private key from the public, only the true owner could have sent the message if decryption indeed gives back the plaintext message. Just as the case with textbook RSA encryption, this simple approach has many problems and does not generate a secure digital signature scheme. In particular, 10 Note that this private-public key pair can never be the same one that is used for encryption. 4.11 Digital Signatures 51 we need to apply some padding scheme in order to defend against extension attacks, and right now, only small messages that can be represented by an integer < n can be signed. This latter problem is usually handled in a generic way, shared between all digital signature schemes, namely to apply a hash function. Instead of signing the original message, the hash of the message (that now can be arbitrarily large) is signed. This approach is sometimes referred to as the hash-andsign paradigm. Of course, the used hash function needs to be cryptographically secure in order for the resulting signature scheme to have any chance of being secure. Moreover, the security of the hash function has to match or be higher than the intended signature security. A cryptographic construction can never be more secure than its least secure part. RSASSA-PSS A secure way of using RSA for signatures has been standardized in PKCS#1 [20]. Two versions of RSA signatures are given, the older deterministic RSASSA-PKCS1v1_5, and the newer randomized RSASSA-PSS construction. Although no direct weaknesses with the first one are known, the second one is recommended for all new systems. 4.12 Definition (RSASSA-PSS Overview). The full details of RSASSA-PSS are not given here, but can be found in PKCS#1 [20]. In particular, the steps of the message encoding operation, called EMSA-PS-ENCODE, and verification operation, EMSA-PSS-VERIFY, are not provided, however the general idea is to hash the message and introduce randomness. Using the mentioned operations as primitives, the following is performed in order to generate a signature and verify the result. Signature Generation Given the signer’s private key, (n, d), and the message to be signed, M, the following steps are performed in order to generate an RSA signature. • Perform the EMSA-PSS-ENCODE on the message, to get the encoded result EM. • Convert EM to an integer m. • Calculate s = md mod n. • Convert the integer s to an octet string to get the resulting signature S. Signature Verification Given the signer’s public key, (n, e), the message to be signed, M, and a signature on the message, S, the following steps are performed in order to verify the provided RSA signature. • Verify that the size of S is valid. • Convert S to an integer, s. • Calculate m = s e mod n. 52 4 Asymmetric Cryptography • Convert m to an encoded message EM. • Perform the verification operation EMSA-PSS-VERIFY and output "yes" if the result is "consistent", otherwise "no". Notice that almost all computation time is the modular exponentiations. Just as the case with RSA encryption, the private exponentiation may be speeded up by using the Chinese Remainder Theorem (see Appendix A.3.1 on page 83), as long as the used primes are stored and provided along the private key. Remember however that the knowledge of the primes completely breaks the scheme, hence this information needs to be handled with as much care as the private key. Also notice that since the private exponent is usually much larger than the public, the time for verifying the signatures is much smaller than the time for generating them. Finally, the possible attacks against the algorithm are the same as for RSA encryption, i.e. as long as the parameters are chosen wisely, factoring the modulus is the best attack, which can be performed in sub-exponential time, as described in section 4.9.1 on page 34. 4.11.2 Digital Signature Algorithm (DSA) The ElGamal encryption utilizes the discrete logarithm problem for providing confidentiality by using an asymmetric key pair. It is also possible to use a similar idea to create a digital signature, called the ElGamal Signature Scheme. A more well known modification of this was proposed by NIST in 1991, and is called the Digital Signature Algorithm [34]. The DSA is a digital signature scheme based on the discrete logarithm problem over prime order cyclic groups, i.e. the group where the arithmetic operations are normal modular calculations. We already know that there is a sub-exponential attack on this problem, and so this scheme does not perform much differently from the RSA one. However, the idea behind the signature scheme, soon to be defined, can be generalized and used under any cyclic group where the discrete logarithm problem is hard. In particular, we can use an elliptic curve group, where we know that only exponential-time algorithms exist. 4.13 Definition (DSA - multiplicative notation). The following steps define the digital signature generation and verification using the Digital Signature Algorithm. DSA Setup In order to prepare for the usage of DSA signatures, the following steps are performed. • Choose a secure hash function, H. • Choose a prime, q, not longer than the hash output. • Choose another prime, p, such that p − 1 is a multiple of q. 4.11 Digital Signatures 53 • Choose an element, g with order q in the group (Zp , · ). • (p, q, g) are called the algorithm parameters and may be shared publically among users. DSA Signature Generation In order for Alice to sign a message, using her private key x, the following is performed. • Alice chooses a random k in [0, q − 1]. • Alice calculates r = (g k mod p) mod q. • Alice computes s = (k −1 (H(m) + xr)) mod q. • Alice outputs the signature (r, s). If r or s happens to be 0 in generation, the process is repeated. DSA Signature Verification In order for Bob to verify a signature, (r, s), given by Alice, with her public key y = g x , the following is performed. • Bob verifies that the signatures values r and s are within the correct range. • Bob calculates w = s−1 mod q. • Bob calculates u1 = H(m)w mod q, and u2 = rw mod q. • Bob computes v = ((g u1 · y u2 ) mod p) mod q. • The verification algorithm outputs "yes" if v = r, "no" otherwise. 4.11.3 Elliptic Curve DSA (ECDSA) A very interesting fact about DSA, as just described, is that nothing in the algorithm depends on the kind of group that is being used, i.e. it can be applied to any group where the discrete logarithm problem is hard enough. We already know that the DLP on elliptic curve groups is considered the hardest, and therefore gives a very good basis for using the DSA algorithm. We refer to this scheme as the Elliptic Curve Digital Signature Algorithm (ECDSA), and define it in the following way. Note that additive notation is used instead of multiplicative. 4.14 Definition (ECDSA). The following steps define the digital signature generation and verification using the Elliptic Curve Digital Signature Algorithm. ECDSA Setup In order to prepare for the usage of ECDSA signatures, the following steps are performed. • A secure hash function, H, is chosen. 54 4 Asymmetric Cryptography • Domain parameters, D = (q, FR, S, a, b, P , n, h), are chosen and shared between the users. ECDSA Signature Generation In order for Alice to sign a message, using her private key dA , the following is performed. • Alice chooses a random k in [0, n − 1]. • Alice calculates (Rx , Ry ) = kG and lets r = Rx mod n. • Alice computes s = (k −1 (H(m) + da r)) mod n. • Alice outputs the signature (r, s). If r or s happens to be 0 in generation, the process is repeated. ECDSA Signature Verification In order for Bob to verify a signature, (r, s), given by Alice, with her public key QA = dA G, the following is performed. • Bob verifies that the signature values r and s are within the correct range. • Bob calculates w = s−1 mod n. • Bob calculates u1 = H(m)w mod n, and u2 = rw mod n. • Bob computes (x1 , y1 ) = u1 G + u2 QA . • The verification algorithm outputs "yes" if x1 = r, "no" otherwise. This definition is slightly simplified, for instance we have not specified how to convert between different representations such as integers, points, and binary data. The full standard, with all these steps defined, can be found in the document by ANSI, X9.62 [36]. Following the standard, the scheme is considered to be as secure as solving the discrete logartihm problem over elliptic curve groups, which we know that no "efficient" adversary can do, as long as the group is large enough and the domain parameters are selected correctly. Notice the striking similarity with the DSA algorithm. The only difference is that we use a different underlying group, and that additive notation is used instead of multiplicative. Notice also that since a point is represented by two coordinates, only the x-value is used. The result of this is that the same short key sizes can be used for elliptic curve digital signatures as for encryption, whereas both the RSA approach and the DSA scheme requires longer keys. This is presented in table 4.3 on the next page. 4.12 55 Public Key Infrastructure Bits of security RSA key size 80 112 128 192 256 1024 2048 3072 7680 15360 DSA key size q p 160 224 256 384 512 1024 2048 3072 7680 15360 ECC key size 160 224 256 384 512 Table 4.3: RSA, ECC, and DSA key size comparison. 4.12 Public Key Infrastructure We now know how to use asymmetric cryptosystems, and also that they solve the key distribution problem since we no longer need a new key for each communicating pair. However, one big issue remains, the one of distributing the public keys. We stated that this can be done over an insecure channel, since the key is public. However, if the channel is insecure, how do we know Eve is not manipulating the channel we are using, and replacing our public key with her own. We need some way to detect this, and verify the correctness of the public key. We need a way to securely connect public keys to names of the holder. One naive approach to solve this problem is to have a central authority that can be queried in order to receive someone’s public key. The communication to the central authority could be secured by letting its public key be hard-coded in every device that need to perform this communication. The main problem with this approach, and the reason that it was initially turned down, is that the central authority needs to be constantly online and available. As soon as it is unreachable, no other secure communication can take place. We refer to this solution as an online Key Distribution Center (KDC). Another approach is instead based on trees of trust. The main idea is that anyone that is trusted by someone we trust, is also trusted by us. This works fine as long as all trust is well deserved, but once we have a rouge node in the chain, it can maliciously allow other rouge nodes to be part of the chain. With this approach, we no longer need online access to a trusted party. An offline document, called a certificate, that connects a public key to its owner and that is verifiably written by someone we trust suffices. 4.12.1 Certificates A certificate is simply a digital document that contains information about what public key belongs to a specific entity. The certificate is digitally signed by another entity referred to as a Certificate Authority (CA). By verifying the certificate, and matching the name in the document with the intended receiver, we guarantee that the public key is legitimate. 56 4 Asymmetric Cryptography The obvious problem with this approach is of course that in order to verify a certificate, we first need to validate the public key of the CA, which means that we need another certificate. The scheme works because this chain of certificates eventually ends at a Root Certificate Authority, which has its public key hardcoded in the communicating device, e.g. in the web browser. Whenever one of the certificates in the chain cannot be verified, no security whatsoever can be guaranteed and in the browser case, the user is given a warning, and asked if he wants to proceed anyway. Hopefully, the user knows that responding positively basically allows anyone to see and alter all sent information, even though encryption is being used. The big advantages with this solution, compared with the online KDC approach, is that the certificate authorities need not be constantly accessible, since the digital signature in a certificate can be verified offline, and also that we need only a limited set of hard coded public keys, since we use the tree-of-trust model, i.e. it scales well. A conceptual model of the certificate approach is shown in figure 4.6. Root CA Hard coded Bob Verified Intermediate CA Intermediate CA Verified CA CA CA Verified CA CA CA Certificate for Alice Bob: “Now I know this public key really belongs to Alice!” Figure 4.6: Public Key Infrastructure - Bob verifies that he really has Alice’s public key. Wednesday, June 6, 12 Part II Implementation and Performance Evaluation 5 Implementation This section describes the implementation of the elliptic curve cryptographic algorithms that has been created during the work with this thesis. Both the used hardware and the choice of software algorithms are discussed. Remember that the purpose of the thesis was to compare the elliptic curve approach with RSA. For this reason, a fair amout of time has been put into optimizing the implementation, without going too far just to earn a few more cycles. 5.1 Hardware Security Module A Hardware Security Module (HSM) is a device that is intended to securely carry out cryptographic operations. An HSM can come in different forms, e.g. as a card that is plugged into a normal computer, or a network device. HSMs are used primarily for security reasons, but also in order to abstract and offload the cryptographic operations from the host computer. An important part of the HSM security is something called tamper protection, which are measures to ensure that an attacker cannot leverage the ability to physically access or modify the module. Whenever a tamper-protected HSM detects intrusion, all sensitive data is wiped off the memory. Realsec constructs and sells hardware security modules with many different capabilities, such as symmetric encryption, message authentication codes, asymmetric encryption, and digital signatures, as described in previous chapters. When starting my thesis work, Realsec had an HSM implementation for asymmetric operations based on RSA, but none for those based on elliptic curves. 59 60 5.2 5 Implementation Hardware The HSM on which the implementation has been performed has built in hardware support for many symmetrical cryptographic constructions, such as AES, 3DES, hash function calculations, and random number generation. It also has a specific Asymmetric Crypto Processor (ACP) unit, which can be used to speed up many operations associated with public key cryptographic schemes, especially those associated with RSA. The actual implementation of asymmetric constructions, including the usage of the ACP, is performed in software on an ARM processor. The processor operates with a word size of 32 bits, runs at a clock frequency of 100 MHz, and contains only one core. The ACP consists of one Bigint Arithmetic Unit (BAU) and two Montgomery Multipliers (MMs). The BAU can perform arithmetic operations, including modular reductions, on arrays of data. Using this is faster than performing the operations in software, especially when the data is larger than the word size, which is the case for all cryptographic operations. The MMs can only perform modular multiplication and exponentiation (which is repeated multiplication), but does so much more efficiently for large data. This is mainly so for two reasons: the fact that the MMs have more dedicated hardware than the BAU, and that they perform so called Montgomery multiplications, described shortly. In addition, the MMs, the BAU, and the software code can be executed in parallel in order to maximize performance. The scheduling of this parallelization is done on the ARM processor in software. The MMs can only operate on special memory which is individual, meaning that data may need to be moved between these two, an operation that can be performed on the BAU which can access both memories. 5.2.1 Montgomery Multiplications When performing a modular multiplication or exponentiation, the reductions are normally a significant part of the work. However, this fraction can be reduced by using a trick referred to as a Montgomery multiplication, as first described in [37]. The idea is to first transform the numbers to the Montgomery domain, where, as we will soon see, reductions are much cheaper than in the normal domain. However, when moving to the Montgomery domain, normal reductions need to be performed. Hence, performance improvements are achieved only when many reductions will be done without the need to move the data back to the normal domain in between. The idea and its benefits are therefore somewhat similar to that of projective coordinates for curve point representation, as described in 4.10.3 on page 43. Some sources refer to the usage of both of these simultaneously as Montgomery projective coordinates, but remember that the two tricks apply to different domains: modular arithmetic and elliptic curves. The key to Montgomery multiplications is to first choose a number R, which is easy to divide with, and then make sure that the numbers we work with is a multiple of R by simply adding the modulus some number of times. Which values are easy to divide with depend on the system where the operations are being per- 5.2 Hardware 61 formed, and the size of the numbers to reduce. In general, all base powers are easy to divide with since they reduce to shift operations. The way we transform a number a to the Montgomery domain is by computing A = a · R (mod N ). Notice that this does require a normal reduction, as previously mentioned. We do this transformation because normal multiplication of two already transformed numbers, A and B, results in A · B (mod N ) = aR · bR (mod N ) = abR2 (mod N ). In order for the result to also be in the Montgomery domain, i.e. multiplied by only one R, we need to divide by R, which we know is easy. The interesting part about this is that if R is chosen sufficiently large (relative to N ) we can guarantee that the result after division is already reduced, and we have replaced the expensive reduction by cheap additions and shifts. In order to transform a number from the Montgomery domain back to the normal domain, we multiply by R−1 (mod N ), and perform a normal reduction. 5.1 Definition (Montgomery Multiplication). We define Montgomery multiplication between two numbers in the Montgomery domain as follows. (AB + mN ) (mod N ) R where m is chosen such that (AB + mN ) is a multiple of R. mont(A, B) = A · B · R−1 (mod N ) = Notice that m is guaranteed to be < R. If we choose R > 4N , we can allow the input values to be < 2N and still not need any additional reduction, since the result will also be < 2N . (2N · 2N ) (AB + mN ) (AB) m = + N < + N = 2N (AB + mN )R−1 = R R R 4N We can actually utilize the definition for the transformations as well. In order to transform a number a to the Montgomery domain, and back again, we can perform the following. A = mont(a, R2 (mod N ) ) = a · R2 · R−1 (mod N ) = aR (mod N ) a = mont(A, 1) = A · 1 · R−1 (mod N ) = a · R · R−1 (mod N ) = a (mod N ) The only remaining expensive operation is the calculation of R2 (mod N ) , which requires a normal reduction. However, since this is independent of the value we are transforming, it can be precomputed as soon as the modulus is known. Also note that the resulting a in the second formula is guaranteed to be fully reduced, i.e. a < N . This can be seen in the previous formula, noting that m = 0 in this case, since aR is already divisible by R. In order to help the reader understand the concept, we will now look at a simple example using very small numbers. 62 5 Implementation 5.2 Example: Montgomery Multiplication Let x = 24, y = 52, and the modulus N = 63. We want to find xy (mod N ) using Montgomery’s method. We choose R to be the smallest number that is larger than 4N and a power of the base (10 in this pen-and-paper case), i.e. R = 103 = 1000. First, use normal reductions1 to find X = x · R (mod N ) = 24 · 1000 (mod 63) = 60 (mod 63) Y = y · R (mod N ) = 52 · 1000 (mod 63) = 25 (mod 63) Next, calculate Z = X · Y = 60 · 25 = 1500. Now, in order to make this divisible by R, we add m · N for some m. In this case m = 500 is a suitable number2 and the result is XY + mN = 1500 + 500 · 63 = 33000. (XY + mN ) mont(X, Y ) = XY R−1 (mod N ) = (mod N ) = R 33000 = (mod 63) = 33 (mod 63) = xyR (mod 63) 1000 So, all we need to do to find xy (mod 63) is to divide by R, which we can do by performing a Montgomery multiplication with 1. mont(xyR, 1) = (xyR + mN )R−1 (mod N ) = (33 + 809 · 63) (mod 63) = 1000 51000 (mod 63) = 51 (mod 63) . 1000 Indeed, 24 · 52 (mod 63) = 51 (mod 63) . = As already mentioned, nothing is gained in this example where only one modular multiplication is performed and R2 (mod N ) is not known a priori. However, in a real application, many modular multiplications may need to be performed and all these calculations can be performed within the Montgomery domain and the transformations to and from will only be performed once. By precomputing the value of R2 (mod N ) , the transformations are also cheap. This fact has been leveraged in the implementation and the Montgomery method is used for all modular multiplications except in the calculation of R2 (mod N ) , which cannot be found using Montgomery’s method. 5.3 Software This section describes the software implementation on the HSM firmware performed in the thesis work. The choice of algorithms with motivations along with some implementation details are given. The full source code is internal to Realsec and is not given here. 1 We could also compute R2 (mod N ) and then calculate X = mont(x, R2 (mod N ) ) and Y = mont(y, R2 (mod N ) ). 2 How to find this suitable number is not covered here, see [37]. 5.3 63 Software 5.3.1 Overall Code Structure All code has been written in C and compiled for the ARM processor, with the exception of some test-generation code that was written in Java. The written code implements digital signatures according to the ECDSA algorithm specified in ANSI X9.62 [36], and hybrid encryption according to the ECIES algorithm specified in ANSI X9.63 [35]. The purpose of the thesis was to compare ECDSA and RSA signatures, but since support for ECIES was straightforward to add once all the underlying operations were available, this was implemented as well. Curves defined over prime order finite fields as well as those defined over powerof-two order finite fields are supported. The ECDSA and ECIES protocol code is the same for these two types, but the underlying operations are completely different. For this reason, the code has been developed in four layers, with specified interfaces between the different layers. Figure 5.1 shows an overview of this layering of the code. The named boxes refer to the provided interface. Layer 4 ECDSA ECIES KeyGen Layer 3 Point multiplication Protocols Elliptic curve operations Layer 2 Layer 1 Addition Add Sub Doubling Mult Inv Finite field arithmetics Figure 5.1: The layers of the software implementation. 5.3.2 Layer 1 - Finite Field Arithmetics The lowest level handles the arithmetic operations in finite fields. These operations are completely different for fields of prime order and fields of power-of-two order. See Appendix A.4.3 on page 87 for definitions of the used finite fields. Prime Fields Wednesday, June 6, 12 As mentioned in the appendix, the arithmetic operations in these fields are just modular operations. This means that normal addition and multiplication can be performed, followed by modular reductions. All these operations are performed in hardware on the ACP. Montgomery’s method as previously defined are used to optimize the modular multiplications and exponentiations. Inversion can be performed either by using the Extended Euclidian algorithm, 64 5 Implementation or by Euler’s theorem using exponentiation, as noted in the Appendix A.3.3 on page 85. Since the MMs uses Montgomery multiplications and therefore can perform exponentiation efficiently, the latter approach was chosen. However even with this approach, the inversions are much more costly than additions and multiplications. In order to mitigate this, projective coordinates were used as described in the previous chapter, i.e. most inversions have been replaced by additions and multiplications. Binary Fields The main functionality for the ACP is to efficiently perform modular arithmetic. Since this is not the basis for power-of-two finite fields arithmetic, these operations had to be done almost completely in software. The BAU was utilized only for performing shift operations on large data. Polynomial basis, as defined in Appendix A.4.3 on page 87, was chosen as the field element representation, since this was expected to give good performance for software implementations [38]. A polynomial is stored in memory as a bitstream where the individual bits represent the polynomial coefficients. A polynomial of degree m − 1 requires m bits of storage. The least significant bit corresponds to a0 and the most significant bit to am−1 . By using this representation, addition of polynomials is just bitwise XOR, which can be implemented very efficiently one word at a time. Since the reduction polynomial always has higher degree than the elements, no reductions are needed. When performing multiplication or inversion, however, reductions will be needed. It is not hard to implement a general reduction algorithm, but since only supporting NIST curves for the binary case was sufficient for my thesis, special algorithms for performing reductions modulo their reduction polynomials, faster than the generic algorithm, have been implemented instead. Both the generic algorithm and the special reductions are defined in Hankerson et al. [33]. Squaring, which is a special case of multiplication, can be performed very efficiently in polynomial basis. All that needs to be performed is to insert a 0-bit between every existing bit. This is done by using a precomputed table with 256 entries, which maps every input byte to two output bytes. This of course means that the result will be much larger than the reduction polynomial and a final reduction need to be performed. For all other cases of multiplication, the so called left-to-right comb method with precomputations has been chosen, as defined in Hankerson et al. [33]. This is among the fastest known for software implementations, defined as follows where W is the word-size, i.e. 32 on the target processor. 5.3 Definition (Left-to-right comb with windows of width w). Let a(z) and b(z) be two polynomials of degree at most m − 1. Compute c(z) = a(z) · b(z) by the following steps. 1. Compute Bu = u(z) · b(z) ∀ polynomials u(z) of degree at most w − 1. 5.3 65 Software 2. Set C = 0. 3. For k from W w − 1 down to 0 do (a) For j from 0 to t − 1 do i. Let u = (uw−1 , ..., u1 , u0 ), where wi is bit (wk + i) of A[j]. ii. Add Bu to C{j}. (b) If k , 0 then C = C · z w (i.e. left shift C by w). 4. Return C In the implementation, w = 4 was chosen as a good balance between storage overhead and precomputation speed-up. If more memory is available, w can be increased to improve the performance further. After the multiplication, a reduction needs to be performed in order for the resulting polynomial to be of degree at most m − 1. Finally, inversion is performed using the polynomial version of the Extended Euclidian algorithm as mentioned in Appendix A.3.3 on page 85. 5.4 Definition (Inversion in F2m using the extended Euclidian algorithm). Let a(z) be a polynomial of degree at most m − 1 and f (z) the reduction polynomial of degree m. Compute a(z)−1 (mod f (z)) by the following steps. 1. Let u = a and v = f . 2. Let g1 = 1 and g2 = 0. 3. While u , 1 do (a) j = deg(u) − deg(v). (b) If j < 0 then perform swap(u, v), swap(g1 , g2 ), and j = −j. (c) u = u + z j v. (d) g1 = g2 + z j g2 . 4. Return g1 = a(z)−1 There exists more complex algorithms that perform this faster, and it is also possible to implement a shortcut for division, instead of inversion followed by multiplication. However, since the performance of the harware supported prime case is much higher than the pure software binary case, not much time was put into investigating this. 5.3.3 Layer 2 - Point Addition and Doubling This section describes how the field arithmetic operations were utilized in order to implement the group operation as described in section 4.10.1 on page 39. Since 66 5 Implementation all cryptographic operations utilize point multiplication, which needs many repeated point operations, projective coordinates are utilized and when adding two distinct points, the mixed coordinate algorithm are used, i.e. one of the two points is given in affine coordinates, and the corresponding Z-coordinate is replaced by 1 in the formulas. Also, the operations to transform points from affine to projective coordinates and back are implemented, however the first of these is not needed in cryptographic protocols when mixed addition is used. Curves over Prime Fields The group operation on curves defined over prime order finite fields has been implemented using the Jacobian Projective (JP) coordinates, as described in section 4.10.3 on page 44. As previously mentioned, the underlying arithmetic operations are performed completely in hardware, and the difficulty lies in scheduling the hardware operations in an optimal way. The hardware manual gave suggestions for this, which was used as the basis. This has been modified in order to optimize further and to support mixed addition. Remember that the hardware consists of two MMs, which perform modular multiplications efficiently, and one BAU, which can be used for the additions and subtractions. The time to perform one MM multiplication is almost exactly the same as the time to perform two BAU additions or subtractions. This fact was utilized when determining a suitable scheduling, as presented shortly. In order to move an affine point to Jacobian projective coordinates, Z is chosen to be 1. This means that no calculations need to be performed. However, the given point coordinates need to be converted to the Montgomery domain, by using the MMs and calculating the result when multiplying with R2 (mod N ) (see section about Montgomery multiplication), which is precomputed as soon as the curve is chosen. The conversion from JP to affine coordinates requires one inversion and several multiplications but does still only utilize one MM and is performed completely sequentially. This has only a negligible impact on the total performance since this transformation is used so rarely compared to the other operations. In order to add two distinct points, one given in Montgomery JP coordinates and the other in Montgomery affine coordinates, the scheduling presented in figure 5.2 on the next page has been chosen, implementing the formula given in 4.10.3 on page 44. The blue cells, along with the red arrows, present a path of immediate dependence, meaning that none of these operations can be performed earlier than they are, since they depend on the data of the immediately preceding blue cell. Using the presented scheduling, we see that the full time is actually no longer than the time of this specific path. This means that all operations in gray cells are "free", and more importantly that the given scheduling is in fact proven to be optimal. In order to double a point given in Montgomery JP coordinates, the scheduling shown in figure 5.3 on page 68 has been chosen. A path of immediate dependence is presented in this case as well, and since the 5.3 67 Software Addition Time MM₁ MM₂ Z₂² Z₂² λ₁ = X₁ ·∙ Z₂² Z₂³ λ₄ = Y₁ ·∙ Z₂³ λ₃² λ₇ ·∙ λ₃² λ₆² λ₃³ Z₃ = Z₂ ·∙ λ₃ λ₁₀ = λ₈ ·∙ λ₃³ BAU λ₃ = λ₁ -‐ X₂ λ₇ = λ₁ + X₂ λ₆ = λ₄ -‐ Y₂ λ₈ = λ₄ + Y₂ λ₃ → MM₂ X₃ = λ₆² -‐ λ₇ ·∙ λ₃² λ₉ = λ₇ ·∙ λ₃² -‐ X₃ λ₉ = λ₉ -‐ X₃ λ₁₁ = λ₉ ·∙ λ₆ Y₃ = (λ₁₁ -‐ λ₁₀)/2 Figure 5.2: The scheduling for point addition total time is no longer than the time of this path, the scheduling is optimal. Curves over Binary Fields The group operation on curves defined over power-of-two order finite fields has been implemented using the López-Dahab Projective Coordinates (LD), as described in 4.10.3 on page 44. As previously mentioned, the underlying arithmetic operations are performed completely sequentially in software. In order to move an affine point to LD coordinates, Z is chosen to be 1. This means that no calculations need to be performed. During the conversion back to affine coordinates, Z is inverted using the Extended Euclidian algorithm and is then used to calculate the corresponding coordinates through multiplications. In order to add two distinct points, one given in LD coordinates and the other in affine coordinates, and in order to double a point given in LD coordinates, the formulas in 4.10.3 on page 44 have been implemented by calling the functions for performing the binary field arithmetics in the corresponding order. The case where one of the input points is the point of infinity is handled as a special case, and the result is then simply the other input point (possibly also the point at infinity). 5.3.4 Layer 3 - Point Multiplication The point multiplication utilizes the group operations in layer 2 in order to implement multiplication by integer (exponentiation if multiplicative notation would be used). We already know that the naive approach of simply performing repeated addition is an exponential-time algorithm, but in Appendix A.3.2 on page 84 68 5 Implementation Doubling MM₁ MM₂ Z₁² X₁² Z₁⁴ = (Z₁²)² Y₁² a ·∙ Z₁⁴ λ₂ = 4 ·∙ X₁ ·∙ Y₁² Time Y₁⁴ λ₁² λ₃ = 8 ·∙ Y₁⁴ 2 ·∙ Y₁ λ₄ ·∙ λ₁ BAU 2X₁ 4X₁ 2X₁² 3X₁² Y₁² → MM₁ λ₁ = 3X₁² + aZ₁⁴ 2 λ₂ λ₁ → MM₁ X₃ = λ₁² -‐ 2λ₂ λ₄ = λ₂ -‐ X₃ Z₃ = 2 ·∙ Y₁ ·∙ Z₁ Y₃ = λ₄ ·∙ λ₁ -‐ λ₃ Figure 5.3: The optimal scheduling for point doubling. The blue cells along with the red arrows shows a path of direct dependence. Since this path is no faster than the total time in both cases, all gray operations mentioned the and exponentiation by issquaring forfor multiplicative notation, are "free" the given scheduling in fact optimal the given formulas. we which may be called multiplication by doubling or double-and-add in our additive case. log (n) Using this method, we only need log2 (n) doublings and on average 22 additions, where n is the number we multiply with, since n on averge has half its bits set. The bits of the integer may be processed both from the left to the right, or in the opposite order. 5.5 Definition (Left-to-right point multiplication by doubling). Given a number k, with binary representation k = (km−1 , ..., k1 , k0 )2 and a point P on an elliptic curve, the result of the multiplication kP is calculated by the following steps. 1. Let Q = O (the point at infinity). 2. For i from m − 1 to 0 do (a) Q = 2Q. (b) If ki = 1 then Q = Q + P . 3. Return Q = kP . By using this algorithm, one of the values in the addition is constant, P . This means that we can keep this value in affine coordinates and use the algorithm for mixed addition, which we know is more efficient than the generic algorithm. By using the right-to-left approach, this benefit disappears, however another opportunity presents itself. 5.3 Software 69 5.6 Definition (Right-to-left point multiplication by doubling). Given a number k, with binary representation k = (km−1 , ..., k1 , k0 )2 and a point P on an elliptic curve, the result of the multiplication kP is calculated by the following steps. 1. Let Q = O (the point at infinity) 2. For i from 0 to m − 1 do (a) If ki = 1 then Q = Q + P . (b) P = 2P . 3. Return Q = kP . Notice that the value of P during the calculation is no longer constant, meaning that mixed addition cannot be used. However, the different values for P are all completely independent of what number is being multiplied with. Since for cryptographic operations, P is often the curve generator point and known prior to running the algorithm, these log2 (n) doubling values may be precomputed. This would speed up the multiplication by up to 2/3 at the cost of additional storage overhead. Unfortunately, in my implementation, the amount of available memory is very limited and no such precomputations are performed. Instead, the previous approach, which enables mixed addition, is used. There also exist other algorithms for performing point multiplication that are faster than double-and-add, and still doesn’t require any aditional storage. However, the maximum gain is expected to be around 10% and this has not been implemented due to time restrictions. Moreover, the interesting numbers to look at when comparing this with the RSA approach is not the exact running times, but rather an approximate relation and more importantly the development in time and storage requirements as the number of bits of security changes. This comparison can be made regardless of a full optimization or not. There is, however, one problem with all the above mentioned algorithms. The running time depends on the number of bits that are set in the integer we multiply with, and different operations are performed in the algorithm depending on these bits. This may enable for a side-channel attack, where the time or power is measured in order to deduce information about the integer value. One way to defend against this would be to replace the standard double-and-add algorithm with the following. 5.7 Definition (Point multiplication by using a Montgomery ladder). Given a number k, with binary representation k = (km−1 , ..., k1 , k0 )2 and a point P on an elliptic curve, the result of the multiplication kP is calculated by the following steps. 1. Let R0 = O (the point at infinity) 2. Let R1 = P 70 5 Implementation 3. For i from m − 1 to 0 do (a) If ki = 1 then i. R1 = R0 + R1 ii. R0 = 2R0 (b) else i. R0 = R0 + R1 ii. R1 = 2R1 4. Return R0 = kP . This will always take the same amount of time, regardless of the value of k, however the running time is still the same as the average case for the double-and-add algorithm. Also in this case, we have the problem that mixed addition cannot be used. For this reason, this method has not been implemented in the code. If side-channel attacks are decided to be a big concern however, using this is the best choice. In some cryptographic operations, such as in signature verification in ECDSA, the addition of two point multiplications kP + lQ needs to be calculated, sometimes referred to as multiple point multiplication. In my implementation, this is simply performed as two separate multiplications followed by one final addition. However, by using something called Shamir’s trick [16], this can be done more efficiently. For the same reason as in the previous paragraph, this was chosen not to be implemented. Also, since RSA normally uses a small public exponent, the ECDSA signature verification will be completely outperformed by RSA in any case. 5.3.5 Layer 4 - Cryptographic protocols Having the first three layers settled, we have formed the basis that is needed in order to build the cryptographic constructions, i.e. we have created and can perform operations in a cyclic group with a hard discrete logarithm problem. Key pair generation, ECDSA, and ECIES have been implemented according to ANSI X9.62 [36] and X9.63 [35]. Key Pair Generation The key pair generation process is simply choosing a random number and then performing point multiplication beween this number and the base point of the selected curve, as specified in section 4.10.7 on page 47. The random number generation is performed using the Random Number Generator (RNG) provided on the hardware chip. The RNG generates a number with the same number of bits as the modulus, but not necessarily smaller than it. In order to not create a bias, a reduction cannot be used to ensure that the result lies in the correct range. Instead, a probabilistic approach is taken where the RNG 5.3 71 Software is used repeatedly until the result is valid. This may of course in theory never terminate but will for all practical purposes work fine. ECDSA The ECDSA algorithm has been implemented as specified in section 4.11.3 on page 53. Most of the implementation is very straightforward but some steps are commented here. The random number generation has been performed as for key pair generation. The hashing is performed by utilizing the special Hash Engine on the chip. This performs all operations of the hashing and only the set-up and feeding of the data need to be handled in software. Since the curve order is prime for all NIST curves, the inversion of k can be performed by using Euler’s theorem and an MM. For this to work, the number k is first transformed to the Montgomery domain and the inverted result is then transformed back. The other arithmetic operations are performed using the BAU, including a final reduction. This may possibly be optimized by leveraging the MMs, but since the time for these calculations is very short compared to the point multiplication, the simpler approach was chosen. In fact, the performance of the full signature generation is completely determined by the performance of point multiplication. Thus, any optimizations should be performed at layer 3 or lower, leaving layer 4 as readable and simple as possible. Finally, note that only one point multiplication is used when generating a signature, whereas two is needed for the verification, meaning that generation will be faster than verification. The difference in my implemetation is a factor two, but can be made smaller by implementing Shamir’s trick as previously mentioned. ECIES The ECIES algorithm has been implemented as specified in section 4.10.8 on page 47. Most of the implementation is very straightforward but some steps are commented here. The generation of random numbers and the hashing are performed in the same way as for ECDSA. The key derivation function is implemented in a very simple but in this case still secure manner, since the master key is unique and random for all instances. 5.8 Definition (Key Derivation Function for ECIES). Given a master key S of length l and a hash function H of output size at least l, the encryption key kenc and the authentication key kmac of lengths l are derived as kenc = H(S||1) kmac = H(S||2) 72 5 Implementation where || denotes concatenation and the numbers are padded to word size. If |H( · )| > l, the output is truncated to match l. Any secure symmetric encryption algorithm may be chosen, but for simplicity during implementation, only XOR-encryption was tested. Also the MAC algorithm may be changed, but the implemented HMAC-algorithm is the recommended one. This was straightforward to implement since hashing was already available. Also in this case, the performance of the full encryption is largely dependent on the performance of point multiplication. The size of the data and the choice of symmetric algorithm determine exactly how much of the total computation the point operations correspond to. Note that the encryption requires two point multiplications, whereas the decryption only requires one. 5.3.6 Testing In order to ensure the correctness of the implementation, unit tests were written for each separate functionality, all the way from finite field arithmetic operations to full signature generations and verifications. Every time a change in the code was performed, all tests were rerun in order to ensure that the update did not break any existing functionality. In order to test the end system, and in particular the ECDSA signature generation and verification, test data from NIST has been used. For each recommended curve, they give for each hash function (SHA1, SHA224, SHA256, SHA384, and SHA512) 15 example inputs and expected results, both for signature generation and verification. This data came in the form of a text file and a Java program was written to process this text file and generate unit tests in the form of C code. Performance measurements, described in the next section, were only performed after all correctness tests passed. 6 Performance Evaluation of Implementation This chapter covers the result of the implementation performed on the HSM. Signature generation, verification and key pair generation for the RSA algorithm and for ECDSA are measured and compared. The performance of RSA is measured as the code was given, and no effort has been put into searching for optimization opportunities in this code. For this reason, and in order to highlight the differences between software and hardware performance, other sources with performance measurements were also consulted. The performance of ECIES has not been measured and compared to an RSA equivalent, since the execution time for these largely depend on the choice of symmetric algorithm, which may be the same in both cases. 6.1 Performance of HSM implementation Performance measurements have been run for RSA and ECC based on prime and binary finite fields. Data was generated and is presented here for key pair generation, and for digital signature generation and verification. The performance of the cryptographic protocols depends almost solely on the performance of the underlying mathematical operations, i.e. modular exponentiation for RSA and curve point multiplication for ECC, which means that the results would basically hold for any choice of protocol. For all measurements, the RSA public exponent is chosen to be e = 65537, as is recommended. This means that the private exponent will be large and hence that private operations (decryption and signature generation) will be much slower than the public operations (encryption and signature verification). For ECC, any asymmetry between the times for the private and public operations depends com73 74 6 Performance Evaluation of Implementation pletely on the used protocol. For ECDSA, verification is slower than generation since it needs two point multiplications. As discussed in the previous chapter, the verification is expected to be half as fast as generation for my implementation. Whenever random numbers are expected in the algorithms, the hardware randomizer is used. The algorithms are then executed sufficiently many times such that the measured average running time is almost constant with insignificant variations. The time is measured using a real time clock included on the hardware. The absolute time is provided, even though the most interesting numbers are the relation between the RSA and ECC implementations. The used curves for the elliptic curve case are the NIST ones for both the prime case (prefixed by P) and the binary case (prefixed by B). The exception is P160, which is not a NIST standard curve, but instead defined by Certicom Research [39]. This is included to be able to compare with the very common key-size of 1024 bits for RSA, even though this is considered too small to be secure by NIST since 2010 [3]. 6.1.1 Key Pair Generation The definition of RSA key pair generation is given in section 4.7 on page 32, and for ECC in section 4.10.7 on page 47. Notice that for RSA, two large prime numbers need to be generated, and this is a big part of the algorithm running time. An improved algorithm for finding large prime numbers would speed up this operation. For ECC, the generation process is nothing more than a random integer generation followed by one point multiplication and is therefore completely dependent on the performance of the latter. Table 6.1 presents the data collected for the key generation operations. The same data is visualized in figure 6.1 on the next page. Table 6.1: RSA and ECC key pair generation HSM performance. Bits of security Size RSA Time 80 112 128 192 256 1024 2048 3072 7680 15360 57 ms 322 ms 915 ms 2 min∗ 4 days∗ ∗ ECC Prime Name Time ECC Binary Name Time P160 P224 P256 P384 P521 B163 B233 B283 B409 B571 4.2 ms 5.4 ms 6.6 ms 10.8 ms 27.2 ms 95 ms 190 ms 265 ms 640 ms 1.45 s Generated by extrapolation. Remember that the binary curves are implemented completely in software on a 100 MHz ARM processor, whereas RSA and prime curves are implemented mainly in hardware. The results for the binary case are therefore not comparable with the others and are included only for completeness. By using a faster 6.1 75 Performance of HSM implementation 9 8 log(Time [ms]) 7 ECC Prime ECC Binary RSA 6 5 4 3 2 1 0 80 112 128 192 256 Bits of Security Figure 6.1: Key pair generation, notice the logarithmic scale on the y-axis. processor, the binary curve would be vertically shifted in the figure, but with its shape intact. In particular, the exact same algorithms running on a 1 GHz processor would move it down by one unit (since logarithmic scale, with base 10, is used), landing only slightly above the prime case curve. Notice the exponential decrease of performance for RSA as the key size grows, which comes from the structure of the RSA operations, and the fact that the key size needs to grow exponentially when security is increased linearly. This fact will eventually make RSA completely impractical to use. Even though using 4096 or more bits for RSA is considered overkill today, future increase in computational capabilities and also improved attacks on the RSA cryptosystem may eventually mean that this will be required. Waiting minutes, or even hours, for a key pair to be generated is clearly not practical. Before discussing the result further, we will present the data for the signature generation and verification cases. 6.1.2 Signature Generation The definition of RSASSA-PSS signature generation is given in section 4.11.1 on page 51, and for ECDSA in section 4.11.3 on page 53. When performing the measurements, the message "abc" has been signed, and the chosen hash function is the smallest possible providing enough security (e.g. SHA256 is used when 128 bits of security is wanted). The result of signature generation is presented in table 6.2 on the following page. For RSA, the performance both with and without 76 6 Performance Evaluation of Implementation the speed-up using the Chinese remainder theorem is provided. Table 6.2: RSA (RSASSA-PSS) and ECC (ECDSA) signature generation HSM performance. Bits of security Size RSA w CRT w/o CRT 80 112 128 192 256 1024 2048 3072 7680 15360 0.75 ms 2.6 ms 7 ms 425 ms∗ 7.5 min∗ 2.3 ms 15 ms 48 ms 3.9 s ∗ 2 hrs ∗ ∗ ECC Prime Name Time ECC Binary Name Time P160 P224 P256 P384 P521 B163 B233 B283 B409 B571 4.4 ms 5.7 ms 6.9 ms 11.4 ms 18.5 ms 95 ms 190 ms 265 ms 640 ms 1.45 s Generated by extrapolation. The same data is visualized in figure 6.2, using logarithmic scale on the y-axis. The data is also presented with linear scale on the y-axis in figure 6.3, in order to clearly see the devastating RSA time development compared to the ECC case. 7 6 ECC Prime ECC Binary RSA w/ CRT RSA w/o CRT log(Time [ms]) 5 4 3 2 1 0 80 112 128 192 256 Bits of Security Figure 6.2: Digital signature generation with logarithmic scale on the y-axis. First of all, notice that the ECDSA times are only slightly longer than the times for key pair generation, confirming the theory that almost all time is spent on the point multiplication. Also, the ECDSA signature generation code is not fully optimized, e.g. uses the BAU instead of an MM for modular multiplication, as 6.1 77 Performance of HSM implementation 30 Time [ms] 25 20 RSA w/ CRT RSA w/o CRT ECC Prime 15 10 5 0 80 112 128 192 256 Bits of Security Figure 6.3: Digital signature generation with linear scale on the y-axis. described in the previous chapter. This choice was made in order to make the code simpler (for the BAU, the value need not be in the Montgomery domain) and easier to maintain. Still, the overhead for the prime case is only about 5%, and negligible for the binary case, showing that optimizing at this layer would not give much speed-up, and that spending the time and effort on the lower layers is a better idea. For RSA, the overhead of signature generation is even smaller and the times are almost identical to that of modular exponentiation. Using the Chinese Remainder Theorem gives improved performance for the RSA case, at the cost of some precomputations and an increased size of the private key, since the modulus prime factors need to be stored. 6.1.3 Signature Verification The definition of RSASSA-PSS signature verification is given in section 4.11.1 on page 51, and for ECDSA in section 4.11.3 on page 53. To measure verification, the message "abc" has been signed once, and is then verified many times and the average time calculated. As for generation, the chosen hash function is the smallest providing sufficient security. The results for signature verification are presented in table 6.3 on the following page, and visualized in figure 6.4 on the next page. As already mentioned, the RSA algorithm completely outperforms the ECDSA algorithm due to the small public exponent. Also notice that the ECDSA times are a factor two of the generation times, since the multiple point multiplication is performed as two separate multiplications. 78 6 Performance Evaluation of Implementation Table 6.3: RSA (RSASSA-PSS) and ECC (ECDSA) signature verification Bits of security 80 112 128 192 256 RSA Size time 1024 2048 3072 7680 15360 130 µs 270 µs 470 µs 4.7 ms∗ 243 ms∗ ∗ ECC Prime Name Time ECC Binary Name Time P160 P224 P256 P384 P521 B163 B233 B283 B409 B571 9.3 ms 11.2 ms 13.6 ms 22.9 ms 37.1 ms 192 ms 390 ms 550 ms 1.3 s 2.9 s Generated by extrapolation. 7 6 log(Time [µs]) 5 4 3 ECC Prime ECC Binary RSA 2 1 0 80 112 128 192 256 Bits of Security Figure 6.4: Digital signature verification with logarithmic scale on the y-axis. 6.2 Performance of Other ECDSA and RSA Implementations In order to verify the correctness of the achieved results, and also in order to highlight the potential differences in the relation between the RSA and ECC approach when performing the implementation in software or hardware, additional sources with performance measurements have been examined. First of all, the achieved results seem reasonable when comparing them with 6.3 Conclusion 79 other similar comparisons. The facts that ECC dominates for key pair generation, that ECC wins over RSA for signature generation for the higher security case, and that RSA by far outperforms ECC for signature verification is stated in other sources as well, e.g. in Jansma and Arrendondo [40]. In [41], Brown, Hankerson, Lopez, and Menezes show that in a pure software implementation, the prime case ECC performs similarly to the Koblitz curves binary case, and better than the random binary curve case. This would give the same relation between RSA and ECC as found in this thesis. Finally, the interesting fact that a hardware implementation of binary curves by far outperforms the prime case is presented in Wenger and Hutter [42]. This means that using binary field hardware, the ECC constructions may perform better than RSA even for the lowest security cases, still with an exponential relative improvement for the ECC case as the security grows. 6.3 Conclusion Using the elliptic curve approach for digital signatures, and for public key constructions in general, clearly offers some great advantages over the RSA approach, both regarding computation time and storage needs. The results presented above show that for all systems where key pair generation is a frequent operation, the elliptic curve approach is preferred. For the higher security cases, ECC outperforms RSA also for signature generation, and since NIST recommends that in a very near future at least 112 bits of security be used, where the performance of ECC is similar to that of RSA, all new systems should clearly support ECC. For signature verification, the elliptic curve approach only compares with RSA for security cases that are currently extreme, although future development in computational power and integer factorization algorithms may change this. However, if a system is very verification intense, and does not require long forward security, then RSA may still be the better choice. The smaller key sizes for elliptic curves also give advantages other than the computation time. The storage need is much lower, which may facilitate the implemenetation on limited embedded systems, such as smart cards which are commonly used in cryptography. Also, the smaller the keys and certificates are, the faster any network transfers of them will be. An interesting question to ask is that despite the statements above, which clearly speak in favor of elliptic curves, why is RSA still more used than ECC? My belief is that there are three main reasons for this, legacy, patents, and simplicity. First of all, RSA is older and therefore more well established. It was first released in 1978, whereas the first proposal to use elliptic curves as the basis for cryptography was made in 1985. RSA simply got a head start that ECC still has some trouble to catch up to. 80 6 Performance Evaluation of Implementation Moreover, parts of the ECC algorithms are covered by patents. While it is still possible to implement a system using elliptic curves without infringing on these patents, the extra work of dodging them might scare off some implementors, in favor of an RSA implementation. These patents and their implications are discussed further by NIST in [43]. Finally, many people think that the idea behind RSA is both easier to understand and to describe than that of elliptic curves. The problem of integer factorization, which is often falsely described as the one reason that RSA works, is something that people can relate to, since everyone knowns how to multiply numbers. The fact is that RSA is even mentioned in some sub-university math classes, since all you need to know in order to understand the basis is modular arithmetic. Trying to understand elliptic curves without having some understanding of groups and finite fields is probably a bad idea. It is clear that RSA is not the ultimate solution to cryptography, because of its sub-exponential time attacks. As more and more people see the benefits of the elliptic curve approach, as the computational power increases which makes the gap to RSA even bigger, and as more cryptographic libraries implement and promote elliptic curves, such that the user does not really have to know how it works in order to use it, RSA will eventually lose the battle. It is of course possible that better attacks against elliptic curve groups may be discovered one day, similar to those for other groups, but the mathematical and cryptographic research community is today far from that day. We can only hope that this is true also for closed research groups, such as intelligence agencies. Interesting future work could be to perform a hardware implementation of curves over binary fields, and to compare the two choices of basis. Also, if more HSM storage is available, algorithms that use a higher degree of precomputations could be considered and compared. A Mathematical Prerequisites This appendix defines and explains the mathematical prerequisites for the text in this report. Whenever a concept is first used, this appendix is referenced. The purpose of this appendix is not to be complete in any way but rather to cover just enough for the reader to understand the material in the report. The reader is assumed to be familiar with basic concepts such as set theory and some mathematical notation. A.1 Complexity Theory Complexity theory is a way to classify the required running time, i.e. the complexity, of algorithms depending on the size of the input, n. Often, the most interesting case to look at is what happens when n grows large. In that case, we talk about the asymptotic running time of the algorithm. There are several notations for this, but the most useful one for us is the following. A.1 Definition. We define the asymptotic tight bound notation, Θ, as f (n) = Θ(g(n)) if ∃c1 , c2 ∈ R+ and n0 ∈ Z+ such that c1 g(n) ≤ f (n) ≤ c2 g(n) for all n ≥ n0 . We say that an algorithm is "efficient" if it finishes in polynomial time, i.e. its running time = Θ(nk ) for some k. An algorithm whose running time cannot be bounded by a polynomial is called an exponential-time algorithm. Sub-exponential time algorithms are non-polynomial time algorithms that are faster than pure exponential time algorithms. 81 82 A Mathematical Prerequisites Sometimes we talk about the complexity of a type of problem instead of a specific algorithm. We refer to the hardness of a problem as the running-time of the fastest known algorithm that solves it. We say that a problem is "hard" if there is no currently known polynomial-time algorithm that can solve the problem, and that it is "easy" otherwise. The complexity class P contains all decision-problems that can be solved in polynomial time, and the class NP contains all decision-problems where a given solution can be verified in polynomial time. This means that P ⊆ NP. An illustrative example is integer factorization, which is considered hard to perform (in the sense that no polynomial time algorithm exists to date), but is very easy to verify since multiplication is fast. Most mathematicians believe that P , NP, but no one has been able to prove it. Many cryptographic constructions are considered safe only if P , NP, as we will see in chapter 4. Some problems in NP are considered to be the hardest, called NP-complete, in the sense that finding a solution to just one of these will solve all problems in NP. More formally, a problem is NP-complete if any other NP-problem can be reduced to it, by some polynomial time reduction algorithm. NP-completeness sounds like a good property for problems to base cryptography on, but unfortunately no secure constructions has been found from these. The commonly used integer-factorization problem, and the discrete logarithm problem, described below, are both considered not to be NP-complete. A.2 Number Theory We let Z denote the set of all integers. We also let Zn denote the set of all nonnegative integers smaller than n, i.e. Zn = {0, 1, 2, ..., n − 1}. A.2 Definition (Divisibility). We say that an integer a divides another integer b if b = ac where c is also an integer. We then also say that a is a factor of b. A.3 Definition (Common Divisor). If an integer c divides both a and b, then we say that c is a common divisor of a and b. The greatest such integer is called the greatest common divisor of a and b and is denoted by gcd(a, b). A.4 Definition (Prime Numbers). If for a and b, gcd(a, b) = 1 holds, we say that the two integers are relatively prime. If gcd(a, b) = 1 ∀b ∈ Za , we say that a is prime. Another way to say this is that a has no divisors besides 1 and itself. A.5 Definition (Prime Factorization). Every integer has a unique set of prime factors. Finding this factorization is called prime factorization, or integer factorization. A.3 83 Modular Arithmetic A.3 Modular Arithmetic In modular arithmetic we always want the result of an arithmetic operation between two integers to be in the range Zn = {0, 1, · · · , n − 1} where n is called the modulus. Every time the result of an operation is outside this range, we add tn, where t ∈ Z is chosen such that the result is within the range. We call this operation a reduction. This means that the result will be the remainder after division by n. Another way to define this is through n so called congruence classes, where the set {0, n, 2n, 3n, ...} forms one such class, {1, n + 1, 2n + 1, 3n + 1, ...} another, etc. To denote that two numbers a and b are in the same congruence class, or congruent modulo n, we write a ≡ b (mod n) or just a = b (mod n). A.6 Example 25 ≡ 15 ≡ 5 (mod 10) since when diving 25, 15, or 5 by 10, the remainder is 5. When performing modular calculations, the following rules are very useful. a+b a−b ab (mod n) = (a mod n) + (b mod n) (mod n) = (a mod n) − (b mod n) (mod n) = (a mod n) · (b mod n) This means that we can do reductions in the middle of calculations, instead of having to wait until the end. A.7 Example (12 · 18 + 27) (mod 8) = (12 · 18) (mod 8) + (27 mod 8) = (12 mod 8) · (18 mod 8) + (3 mod 8) = (4 mod 8) · (2 mod 8) + (3 mod 8) = (8 mod 8) + (3 mod 8) = 3 mod 8 A.3.1 The Chinese Remainder Theorem The Chinese Remainder Theorem (CRT) states the following. A.8 Definition (Chinese Remainder Theorem). Given k pairwise coprime positive integers, n1 , n2 , · · · , nk , and k integers a1 , a2 , · · · , ak , there exists an integer x 84 A Mathematical Prerequisites solving the following system. x ≡ a1 (mod n1 ) x ≡ a2 (mod n2 ) .. . x ≡ ak (mod nk ) Also, then all solutions x are congruent mod N = n1 n2 · nk . This will be useful when performing modular exponentiation of composites, since we can then perform the operations modulo the individual prime factors of the modulus, and then solve the system to find the solution with the composite as modulus. A.9 Example We want to calculate 412 (mod 33). Since 33 = 3 · 11 is a known prime factorization, we can apply the Chinese Remainder Theorem and write this as the system 412 ≡ 4(2 · 6) ≡ (42 )6 ≡ 16 ≡ 1 (mod 3) 412 ≡ 4(10+2) ≡ (41 0) · 42 ≡ 1 · 16 ≡ 5 (mod 11) Now, the value 49 is in both congruence classes, 1 (mod 3) and 5 (mod 11), so the solution is 49 ≡ 16 (mod 33). Performing the raw exponentiation, we would of course also get the same result. 412 = 16777216 ≡ 16 (mod 33). A.3.2 Modular exponentiation By definition, ak = a · a · a · a · a · a · · · (k times). By adding one bit to k, we double its size, which means that the time to perform exponentiation this way grows exponentially with the size of the power. More specifically, the required multiplications are Θ(2log2 (k) ) = Θ(k). We would like to perform the exponentiation in another way, such that it can be carried out "efficiently", i.e. in polynomial time. What we do to exponentiate with an n bit number, is to look at the binary representation, and simply multiply all the powers of two that correspond to the bits with value 1. This will require no more than 2n multiplications in total, instead of the maximum 2n multiplication when using the previous, naive method. On n average, the original method required 2 2 multiplications whereas the new one requires only 32 n multiplications (n for the powers of two, and then one for each bit set, which on average is n2 ). We refer to the new method as exponentiation by squaring, and see that it requires Θ(log2 (k)) multiplications, i.e. is polynomial time. A.3 85 Modular Arithmetic A.10 Example Consider the exponentiation a629 . Using the naive method, a would be multiplied with itself 629 times. However, by noticing that 62910 = 10011101012 , we can rewrite the exponentiation as a62910 = a10011101012 = a512 · a64 · a32 · a16 · a4 · a. We now see that we need to square a 9 times, and then perform 5 additional multiplications in order to get the result, thus we have reduced the 629 multiplications to 14. Also note that no additional storage is required, since the result can be computed on the fly while performing the squarings. A.3.3 Multiplicative inverses The multiplicative inverse of a mod n, denoted a−1 is the number such that a · a−1 = 1 mod n. A multiplicative inverse to a mod n exists only if gcd(a, n) = 1. A.11 Definition (Euler Totient Function). The Euler totient function, denoted by φ(n), is the number of integers that have a multiplicative inverse mod n, i.e. the number of elements in Zn that are relatively prime to n. We let Z∗n denote the set of all integers < n that has an inverse mod n (i.e. are relatively prime to n). If n has only two prime factors, p and q, then the number of elements in Z∗n is (p − 1)(q − 1), i.e. φ(n) = (p − 1)(q − 1). If φ(n) is known, then finding multiplicative inverses mod n can be done by using the Extended Euclidian algorithm. The Extended Euclidian Algorithm The Extended Euclidian algorithm is an extension to the Euclidian algorithm which finds the greatest common divisor between two numbers. Given two numbers, a and n, that are relatively prime, the algorithm "efficiently" finds a−1 mod n. The algorithm is described in Hankerson et al. [33]. A.12 Theorem (Euler’s Theorem). If a and n are relatively prime, then aφ(n) ≡ 1 mod n This also means that we have another way to find inverses. Since aφ(n) ≡ aφ(n)−1 a ≡ 1 mod n, we see that aφ(n)−1 is the multiplicative inverse of a mod n. We will soon see that Zn∗ and modular multiplication in fact forms something called a group of order φ(n), and Euler’s theorem holds for groups in general, not only those defined by modular arithmetics. Observe that in the special case where n = p is a prime number, all elements in Zp except 0 will be relatively prime to p and so Zp∗ = Zp \{0} and φ(p) = p − 1. 86 A.4 A Mathematical Prerequisites Groups and Finite Fields A.13 Definition. A group is the pair (G, ◦), where G is a set, ◦ : G × G → G, and where the following rules apply. Associativity: ∀a, b, c ∈ G, the equation (a ◦ b) ◦ c = a ◦ (b ◦ c) holds. Identity: There is an element e ∈ G such that ∀a ∈ G, e ◦ a = a ◦ e = a. Inverse: ∀a ∈ G, ∃ an element b ∈ G such that a ◦ b = b ◦ a = e. A group where a ◦ b = b ◦ a always holds is called a commutative or abelian group. One example of a well known abelian group is the set of integers together with normal addition (Z, +), where "0" is the identity element and the inverse of a is denoted by −a. Another example is the set of real numbers except for 0, under normal multiplication (R\{0}, · ), where "1" is the identity element and the inverse of a is denoted by a−1 = 1a . The group operation, whatever it may be, is usually called "addition" or "multiplication", and then uses the corresponding symbol we are familiar with. If "addition" is chosen, performing the group operation n times to an element a is called "multiplication by integer" and simply denoted by na. If "multiplication" is chosen, repeating the group operation is called "exponentiation" and is written as an . In the rest of this appendix, multiplicative notation will be used. A.4.1 Generators and Subgroups The order of a group (G, · ) is the number of elements in G. Assume that we have a finite group of order n and an element a ∈ G, then there always exists at least one integer t ≤ n such that at = 1. In fact, the elements hai = {1, a, a2 , a3 , ..., as−1 }, where s is the smallest such t for the chosen element, also form a group that is closed under the same operation, called a subgroup of (G, · ). We say that the order of a is the order of the subgroup it generates. If the order of a is n, that is hai = G, then we say that a is a generator, or primitive element, in (G, · ), since the element a can be used to generate the whole group. Generator elements are often denoted by g. We say that a group is cyclic if it has at least one generator element. A.14 Example Assume we have a group (G, · ) where G = {1, 2, 3, 4} = Z?5 and · is normal modular multiplication (modulus 5). Then, since 42 ≡ 16 ≡ 1 mod 5, h4i = {1, 41 } = {1, 4} ⊂ G However, h2i = {1, 21 , 22 , 23 } = {1, 2, 4, 3} = G h3i = {1, 31 , 32 , 33 } = {1, 3, 4, 2} = G A.4 Groups and Finite Fields 87 So the elements 2 and 3 are generators, but 4 is not. Since there is at least one generator in (G, · ), it is a cyclic group. A.4.2 The Discrete Logarithm Problem The discrete logarithm problem is what makes cyclic groups interesting for cryptographic uses. A.15 Definition (DLP). Assume we have a cyclic group (G, · ) of order n with a generator element g. Given this and the element a = g k , finding k ∈ Zn is called the discrete logarithm problem. We know that finding normal logarithms is easy. However, we believe that solving the discrete logarithm problem is "hard", i.e. no known algorithm for computing it is in P. Remember that by definition, g k = g · g · g · g · g · g · · ·. The naive approach for an attacker, who knows g k and g and tries to find k, is to perform the group operation again and again until the expected value is found. For each added bit to the size of k, the work for the attacker is doubled and so this is an exponential time algorithm, running in Θ(2log2 (k) ) = Θ(k). Better attacks exist, and their running time depends on what group is being used. This is discussed further in section 4.9.2 on page 37. Notice however that for the legitimate user, who has knowledge of k, we use the same idea as for modular exponentiation, i.e. exponentiation by squaring, in order to perform the group exponentiation "efficiently". A.4.3 Finite Fields A.16 Definition. A field is the triple (F, +, · ), where F is a set, + : F × F → F, · : F × F → F, and where the following rules apply. • (F, +) is an abelian group with identity element 0. • (F\{0}, · ) is an abelian group. • (a + b) · c = a · b + b · c for all a, b, c ∈ F. (Distributivity) A field with a finite number of elements is called a finite field or a Galois Field (GF). The number of elements in the set is referred to as the order of the field. All fields with order q are in fact the same field, but their representation may be different. For this reason, we usually denote the field with q elements by Fq or GF(q). In general, any prime power is the order of some finite field, and these are the only possible orders. For Elliptic Curve based cryptography, only finite fields with prime order, GF(p), and fields with power-of-2 order, GF(2m ), are considered. 88 A Mathematical Prerequisites Prime Order Finite Field The set of elements in GF(p) can simply be represented by Zp . This is easy to realize since we know that (Zp , +) forms an abelian group with identity element 0 and that (Zp \{0} = Zp∗ , · ) forms another abelian group. This means that all arithmetic operations in this field can be carried out as modular operations, i.e. by just performing normal addition and multiplication, and performing reductions mod p whenever needed. These operations are easy to describe and understand, but not necessarily the most efficient when implemented. Prime order finite fields are generally referred to as simply prime fields. Power of Two Order Finite Field The elements in finite fields of order 2m , m ∈ Z + , is not as easily represented as in the prime case. Note that we cannot represent the set by Z2m since there are elements in this set that are not relatively prime to 2m and therefore doesn’t have a multiplicative inverse, i.e. (Z2m \{0}, · ) does not form a group. Instead, different types of representations are available. The choice of representation affects which operations to perform and depending on the purpose and implementation strategy, different representations are suitable in different cases. In cryptography, two types of representations are generally used, polynomial basis and normal basis. Only polynomial representation was used in my implementation so only this is described here. In polynomial basis representation, the elements are represented as polynomials of degree at most m − 1, with binary coefficients, i.e. F2m = {am−1 z m−1 + am−2 z m−2 + · · · + a2 z 2 + a1 z + a0 : ai ∈ {0, 1}} Intead of working modulo an integer, a prime polynomial (one that cannot be factored), f (z), is chosen and reductions are performed modulo this polynomial. Addition is then performed by simply adding the corresponding coefficients, in F2 , multiplication by normal polynomial multiplication followed by reduction with f (z), and inversion by finding the polynomial such that multiplication with it gives the identity polynomial (i.e. the constant 1) as result. Efficient algorithms for performing these operations are discussed in the section on implementation, chapter 5 on page 59. Power of two order finite fields are generally referred to as simply binary fields. Bibliography [1] David Kahn. The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet. Scribner, New York, 1996. Cited on page 8. [2] Auguste Kerchoff. La cryptographie militaire. Journal des Sciences Militaires, 1883. Cited on page 10. [3] BlueCrypt. Cryptographic key length recommendation - http://www.key length.com/en/4/ (2012-03-20). Cited on pages 12, 36, and 74. [4] Ulrich Kühn. Side-channel attacks on textbook RSA and elgamal encryption. Technical report, Dresdner Bank, IS-STA 5, Information Security, 2003. Cited on page 12. [5] Menezes, Oorschot, and Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. Cited on pages 13, 19, and 20. [6] Jonathan Katz and Yehuda Lindell. Introduction to Modern Cryptography. Chapman and Hall, CRC, 2007. Cited on page 15. [7] Claude Shannon. Communication theory of secrecy systems. Bell System Techincal Journal, 1949. Cited on page 15. [8] EUCRYPT - http://www.ecrypt.eu.org/ (2012-05-20). Cited on page 17. [9] Alex Biryukov and Dmitry Khovratovich. Related-key cryptoanalysis of the full AES-192 and AES-256. Technical report, University of Luxemburg, 2009. Cited on page 19. [10] Ivan Damgård. A design principle for hash functions. Technical report, Advances in Cryptology - CRYPTO’ 89, 1989. Cited on page 21. [11] John Black and Phillip Rogaway. CBC MACs for arbitrary-length messages: The three-key constructions. 2003. Cited on page 24. [12] Mihir Bellare, Ran Canetti, and Hugo Krawczyk. Keying hash functions for message authentication. Advances in Cryptology – Crypto 96, 1996. Cited on page 24. 89 90 Bibliography [13] Mihir Bellare and Chanathip Namprempre. Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. Advances in Cryptology – ASIACRYPT 2000, 2000. Cited on page 25. [14] Martin Hellman. An overview of public key cryptography. IEEE Communications Magazine, 2002. Cited on page 28. [15] Whitfield Diffie and Martin Hellman. New directions in cryptography. IEEE Transactions on Information Theory, 1976. Cited on page 28. [16] Taher ElGamal. A public-key cryptosystem and a signature scheme based on discrete logarithms. IEEE Transactions on Information Theory, 1985. Cited on pages 31 and 70. [17] Ronald Cramer and Victor Shoup. A practical public key cryptosystem provably secure against adaptive chosen ciphertext attack. Proceedings of Crypto, 1998. Cited on page 31. [18] R.L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key cryptosystems. MIT Memo, 1977. Cited on page 32. [19] Dan Boneh. Twenty years of attacks on the RSA cryptosystem. Notices of the American Mathematical Society (AMS), 1999. Cited on pages 32, 35, and 36. [20] PKCS #1 v2.1: RSA Cryptography Standard. RSA Laboratories, 2002. Cited on pages 33 and 51. [21] Thorsten Kleinjung and et al. Factorization of a 768-bit RSA modulus (version 1.4). Technical report, EPFL IC LACAL, Lauesanne, Switzerland, 2010. Cited on pages 35 and 36. [22] Paul C. Kocher. Timing attacks on implementations of diffie-hellman, RSA, DSS, and other systems. Technical report, Cryptographic Research, Inc, 1996. Cited on page 36. [23] Stephen C. Pohlig and Martin E. Hellman. An improved algorithm for computing logarithms over GFp and its cryptographic significance. IEEE Transactions on Information Theory, 1978. Cited on page 37. [24] Daniel Shanks. Class number, a theory of factorization, and genera. Symposium Pure Mathematics, 1972. Cited on page 38. [25] J. M. Pollard. Monte carlo method for index computation (mod p). Mathematics of Computation, Vol. 32, 1978. Cited on page 38. [26] Chris Studholme. The discrete logarithm problem. 2002. Cited on page 38. [27] Victor Shoup. Lower bounds for discrete logarithms and related problems. Theory and Application of Cryptographic Techniques, 1997. Cited on page 38. Bibliography 91 [28] John Aaron Gregg. On Factoring Integers and Evaluating Discrete Logarithms. PhD thesis, Harward College, Cambridge, Massachusetts, 2003. Cited on page 38. [29] Neal Koblitz. Elliptic curve cryptosystems. Mathematics of Computation, Vol. 48, 1987. Cited on page 39. [30] Victor S. Miller. Use of elliptic curves in cryptography. Exploratory Computer Science, IBM Research, 1987. Cited on page 39. [31] Helmut Hasse. Zur theorie der abstrakten elliptischen funktionenkörper. Crelle’s Journal, 1936. Cited on page 46. [32] René Schoof. Counting points on elliptic curves over finite fields. Jurnal de Théorie des Nombres de Bordeaux, 7, 1995. Cited on page 46. [33] Darrel Hankerson, Alfred Menezes, and Scott Vanstone. Guide to Elliptic Curve Cryptography. Springer, 2003. Cited on pages 46, 64, and 85. [34] Federal Information Processing Standards Publication - Digital Signature Standard (DSS). Nation Institute of Standards and Technology (NIST), 2009. Cited on pages 47 and 52. [35] Public Key Cryptography for the Financial Services Industry - Key Agreement and Key Transport Using Elliptic Curve Cryptography. American National Institute of Standards, 2011. Cited on pages 47, 63, and 70. [36] Public Key Cryptography for the Financial Services Industry - The Elliptic Curve Digital Signature Algorithm ECDSA. American National Institute of Standards, 2005. Cited on pages 54, 63, and 70. [37] Peter L. Montgomery. Modular multiplication without trial division. Mathematics of Computation, Vol. 44, 1985. Cited on pages 60 and 62. [38] Darrel Hankerson, Julio López Hernandez, and Alfred Menezes. Software implementation of elliptic curve software implementation of elliptic curve cryptogaphy over binary fields. Technical report, Dept. of Discrete and Statistical Sciences, Auburn University USA, 2000. Cited on page 64. [39] SEC 1: Elliptic Curve Cryptography. Certicom Research, 2000. Cited on page 74. [40] Nicholas Jansma and Brandon Arrendondo. Performance comparison of elliptic curve and RSA digital signatures. Cited on page 79. [41] M. Brown, D. Hankerson, J. Lopez, and A. Menezes. Software implementation of the NIST elliptic curves over prime fields. Cited on page 79. [42] Erich Wenger and Michael Hutter. Exploring the design space of prime field vs. binary field ECC-hardware implementations. Technical report, Institute for Applied Information Processing and Communications (IAIK), 2011. Cited on page 79. 92 Bibliography [43] NIST on ECC - http://www.nsa.gov/business/programs/elliptic_curve.shtml (2012-06-04). Cited on page 80. Upphovsrätt Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare — under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet — or its possible replacement — for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for his/her own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/ © Martin Krisell