PUBLIC KEY CRYPTOGRAPHY SECURITY BASED ANALYSIS OF THE UNDERLYING INTRACTABLE PROBLEMS by HUSAIN H. AL KHULAIF BS., Colorado State University, 2011 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Master of Science Computer Science 2013
ii This thesis for the Master of Computer Science degree by Husain H. Al khulaif has been approved for the Computer Science Program by Ellen Gethner, Chair Bogdan Chlebus Michael Ferrara October 30, 2013
iii Al khulaif, Husain H. (MS, Computer Science) Public Key Cryptography: Security Based Analysis of the Underlying Intractable Problems Thesis directed by Associate Professor Ellen Gethner. ABSTR ACT The trend of the design principle of cryptographic primitives has changed enormously throughout time to serve the obligation of evolved technologies. I survey these changes and focus on certain decisions taken in the design of these cryptographic primi tives through various ages of cryptography, classical, modern and quantum ages. I focus extensively on modern cryptographic primitives, in whom their security is based on the hardness of solving the underlying intractable problems, mainly the Integer Facto rization. We show that most common cryptographic protocols are theoretically related and a threat on one protocol poses security threats on the other protocols. Moreover, theoretical quantum computing stabilization and more practical advances, along with S hor's algorithm threatens the current National Security Administration (NSA) standardized protocol (RSA). I intend to look closely into quantum limitation by discussing two well known intractable problems from the graph theory. These problems are the Graph Isomorphism, and Hamiltonian Path The later problems are studied in terms of suitability for cryptographic design and I provide a preliminary prototype to hide secrets based on them and is believed to confound Quantum computing power The form and content of this abstract are approved. I recommend its publication. Approved: Ellen Gethner
iv ACKNOWLEDGEMENTS Foremost, I would like express my immense gratitude to my thesis advisor Dr. Ellen Gethner for her continues support, motivation, enthusiasm and gu idance throughout writing this thesis. Dr. Ellen Gethner support was not limited to research assistantship, but academically and emotionally through the rough times I had to finish this thesis. I could not have imagined having a better inspiring mentor and advisor for my master study. My sincere thanks also go to my thesis committee: Dr. Michael Ferrara and Dr. Bogdan Chlebus, for their comments and discussions, which has polished this thesis. Dr. Ferrara discussions and questions has served me well to ret hink many parts of the forth chapter. I owe my thesis committee my heartfelt appreciation for all the time and effort dedicated to read and review this thesis Last but not the least, I would like to thank my wife Sakinah Alsayedkadhem and little daughte r Fatimah who always stood by me a nd dealt with all of my absence for the past year. Their appreciated sacrifices are the reason of taking the master journey from a self dream to a reality.
v TABLE OF CONTENTS Chapter 1. Introduction ................................ ................................ ................................ .................. 1 1.1 The Foundation of Cryptography ................................ ................................ ........... 1 1.2 Communication Channels and the Eavesdropper ................................ .................. 6 1.3 Computational Complexity ................................ ................................ .................... 7 1.3.1 Classification of Computational Problems ................................ .................... 8 1.3.2 Model of Complexity ................................ ................................ .................... 10 1.3.3 Theoretical Reductions of Complexity ................................ ......................... 13 1.4 Public vs. Privat e Key Cryptography ................................ ................................ ... 1 4 1.5 Mathematical Model of Public Key Cryptography ................................ ............. 15 1.5.1 One way Functions ................................ ................................ ....................... 16 1.5.2 Digital Signature ................................ ................................ ........................... 20 1.6 Focus ................................ ................................ ................................ .................... 22 2. Cryptographic Primitives ................................ ................................ ........................... 2 5 2.1 An Overview of Hash Functions ................................ ................................ .......... 2 5 2.1.1 Merkle Damgard Construction ................................ ................................ ..... 27 2.1.2 Generic Attacks on Hash Functions ................................ .............................. 30 188.8.131.52 The Birthday Attack ................................ ................................ ............... 30 184.108.40.206 The Length Extension Attack ................................ ................................ 32 2.1.3 Hash Functions Based Authentication ................................ .......................... 3 4 2.2 Block Ciphers ................................ ................................ ................................ ....... 35 2.2.1 Modes of Operation ................................ ................................ ...................... 36
vi 2.2.2 Block Ciphers Cryptanalysis ................................ ................................ ......... 38 2.2.2 .1 Attack Scenarios ................................ ................................ ................... 3 9 220.127.116.11 Differential Cryptanalysis ................................ ................................ ..... 40 18.104.22.168 Linear Cryptanalysis ................................ ................................ ............. 42 2.3 Random Bit Generators ................................ ................................ ........................ 4 3 3. The Underlying Intractabilities Of Modern Cryptosystems ................................ ...... 45 3.1 The Integer Factorization Problem (IFP) ................................ ............................. 47 3.1.1 Complexity Class of the IFP ................................ ................................ ......... 47 3.1.2 Primality Certificates ................................ ................................ .................... 4 7 22.214.171.124 IFP vs. Primality Testing ................................ ................................ ...... 4 8 126.96.36.199 Fermat's Primality Test ................................ ................................ ......... 49 188.8.131.52 Miller Rabin Primality Test ................................ ................................ .. 5 1 184.108.40.206 AKS Primality Test ................................ ................................ ............... 5 3 3.1.3 IFP based Public Key Crypt osystems ................................ .......................... 5 4 220.127.116.11 RSA ................................ ................................ ................................ ....... 5 5 18.104.22.168 Rabin ................................ ................................ ................................ ..... 5 7 3.1.4 IFP b ased Cryptosystems Attacks (Weak Primes) ................................ ....... 5 8 22.214.171.124 Twin Primes Forming the Modulus ................................ .................. 59 126.96.36.199 Fermat's Factoring of Small Difference of Factors .............................. 59 188.8.131.52 Pollard's Factoring if Greatest Factor is Small ............... 60 3.1.5 IFP Based Cryptosystems Attacks (NFS Factorization) ............................... 6 0 184.108.40.206 The Number Field Sieve ................................ ................................ ....... 6 0 220.127.116.11 Step 1: Polynomial Selection ................................ ................................ 6 2
vii 18.104.22.168 Step 2: Sieving ................................ ................................ ...................... 6 2 22.214.171.124 Step 3: The Matrix ................................ ................................ ................ 6 3 126.96.36.199 Step 4: The Square Roots ................................ ................................ ...... 6 3 3.1.6 Statistics of Useless Public Keys ................................ ................................ .. 6 4 3.1.7 Threats Resistance by Incrementing the Key Size ................................ ........ 6 5 3.1.8 Current Milestone of the IFP ................................ ................................ ........ 6 6 3.2 The Quadratic Residue Problem (QRP) ................................ ............................... 67 3. 2. 1 Solving the QRP for a Prime Modulus ................................ ......................... 69 3.2. 2 The QRP Reduction to the IFP ................................ ................................ ..... 7 1 3.2. 3 Complexity Class of the QRP ................................ ................................ ....... 73 3.3 The Discrete Logarithm Problem (DLP) ................................ ............................. 7 3 3.3.1 DLP Based Cryptosystems Attacks (Algorithmically) ................................ 7 4 188.8.131.52 The Index Calculus ................................ ................................ ............... 7 4 3.3.2 DLP based Public key Cryptosystems ................................ ......................... 7 5 184.108.40.206 ElGamal ................................ ................................ ................................ 7 6 220.127.116.11 Diffe Hellman Merkle Key Exchange (DHM) ................................ .... 77 3.3.3 DLP Reduction To IFP And Vice Versa ................................ ....................... 78 3.4 Elliptic Curve Discrete Logarithm Problem (ECDLP) ................................ ........ 78 3.4.1 An Overview of Elliptic Curves ................................ ................................ .. 79 18.104.22.168 Points Addition Operation ................................ ................................ .... 79 22.214.171.124 Defining the Elliptic Curve Group ................................ ........................ 81 3.4.2 Elliptic Curve Discrete Logarithm Problem ................................ ................ 8 3 3.4. 2 .1 Converting Characters into an Elliptic Curve Points ..................... 8 3
viii 3.4. 2.2 Example: Elliptic Curves on DHM Key Exchange ...................... 8 5 3.4. 3 ECDLP Attack ................................ ................................ ............................. 8 6 3.4. 3 .1 The Elliptic Curve Lifting Problem (ECLP) ................................ ......... 87 3.5 Other Problems And Trivial Reductions ................................ .............................. 90 3.5. 1 Diffie Hellman Problem (DHP) ................................ ................................ .... 88 3.5. 1 .1 Computational DHP Reduction t o the DLP ................................ ......... 88 3.5. 1 .2 Computational DHP Reduction to the Decisional DHP ...................... 89 3.5.2 Quadratic Residue Certificate Problem and the IFP ................................ ..... 89 3.6 The Implication of Quantum Computing ................................ ............................. 89 3.6.1 The Basics of Quantum Computing ................................ ............................. 9 0 3.6.2 Quantum Fourier Transfor m ................................ ................................ ........ 9 2 3.6.3 Quantum Attack: IFP ................................ ................................ .................... 9 2 126.96.36.199 IFP Reduction to a Hidden Subgroup Problem (HSP) .......................... 9 3 188.8.131.52 Finding a Group's Order and Factorization ................................ .......... 9 3 3.6.4 Other Quantum Attacks: QRP, DLP and ECDLP ................................ ....... 9 4 3.6.5 Current State of Practical Quantum Attack on the IFP ................................ 9 5 3.6.6 On the Limitations of Quantum Computing ................................ ................ 96 4. Intractable Graph Problems ................................ ................................ ....................... 98 4.1 Preliminaries ................................ ................................ ................................ ........ 98 4. 2 Graph Isomorphism Problem (GIP) ................................ ................................ ..... 99 4. 2 .1 Complexity of the GIP, sGIP ................................ ................................ ...... 10 1 4.3 Hamiltonian Path Problem (HPP) ................................ ................................ ...... 10 2 4. 3 .1 Complexity of the HPP ................................ ................................ .............. 10 2
ix 4.4 One wayness ................................ ................................ ................................ ...... 102 4.4.1 The GIP into a One way Function ................................ ............................. 103 4.4. 2 The HPP into a One way Function ................................ ............................ 104 4. 5 Hiding Secret s in Graphs ................................ ................................ ................... 112 4. 5 .1 Step 1: Constructing a Hamiltonian Path ................................ ................... 105 4. 5 .2 Step 2: Diffusion with Random Edges ................................ ....................... 1 06 4. 5 .3 Step 3: Confusion with the GIP ................................ ................................ 1 07 4. 6 Evaluation and Security Analysis ................................ ................................ ...... 1 09 4. 6.1 Computational Analysis and One wayness Property ................................ 110 4. 6.2 Frequency Analysis ................................ ................................ .................... 111 4. 6.3 Linear and Differential Ana lysis ................................ ................................ 111 4. 6.4 Collision Resistant ................................ ................................ ..................... 112 4. 6.5 Avalanche Property ................................ ................................ ..................... 114 4. 6.6 Parameters Choice and Specifications ................................ ........................ 116 184.108.40.206 The Graph's Order ................................ ................................ 116 4.6.6 .2 The Graph's Size ................................ ................................ ... 117 4. 6.7 Discussion and Farther Work ................................ ................................ ...... 119 References ................................ ................................ ................................ .................... 1 20 Appendix ................................ ................................ ................................ ...................... 12 9 A. Frequency Analysis ................................ ................................ ............................ 12 9 B. Attack Graph ................................ ................................ ................................ ...... 1 30
x LIST OF TABLES Table 1 Illustration of public and private key cryptography.......13 2 Common factorization algorithms with their corresponding running time...47 3 Pseudocode of Fermat's based primality test................................................50 4 Pseudocode of AKS primality test.52 6 Miller Rabin vs. AKS practicality experiment..53 7 NFS algorithms latest records ....63 8 Quadratic residue example.....67
xi LIST OF FIGURES Figure 1 The flow of encryption decryption and signing verifying a digital signature...21 2 Merkle Damgard construction of collision free Hash function....27 3 The progress of integer factorization algorithms overtime67 4 Point additions on Elliptic curves..84
xii LIST OF ABBREVIATIONS Eve Eavesdropper, illegitimate user who desires to breach the communication privacy. Alice, Bob legitimate user who desire to securely communicate over unsecured channel. CRHF Collision resistant hash function IFP Integer factorization problem DLP Discrete logarithm p roblem QRP Quadratic residue problem ECDLP Elliptic curve discrete logarithm problem ECLP Elliptic curve lift problem DHP Diffie Hellman problem CDHP Computational Diffie Hellman problem DDHP Decisional Diffie Hellman problem
1 1.Introduction Cryptography from a historical point of view its mathematical representation, and the thought processes behind the development of communication security from a cryptographic prospective will be given. This chapter focuses on communication channels, entities involved in communication and characteri zation of secured communication channels, which have changed enormously over the past decades. These changes have stimulated the field of cryptography to introduce new principles of security to serve the obligation of evolved technologies (e.g., electroni c commerce, banking and network communications). 1.1 The Foundation of Cryptography Cryptography is defined to be the way of making a message obscure and unintelligible to everyone except to a subgroup of communication par ties known as legitimate users Cryptography has long been utilized by humans, from its earliest form as a tool for writing messages without the need of obscuration, as most early humans could not read. The art of cryptography then progressed to exchange messages in the form of drawn sy mbols, where only certain people knew their interpretation. The later form of cryptography is demonstrated by the Egyptian hieroglyphic canons [Sin00]. As cryptography continued to evolve throughout time, it held high interest to militaries, as it could b e used to securely exchange messages between commanders on the battlefield. The early forms of military cryptography were Spartan and Caesar [CMF12]. These cryptographic protocols were constructed such that each communicating party was required to grant a rotating device to encrypt decrypt messages and agree on letters
2 substitution as a joint secret quantity. In addition, Enigma rotor machine was used and considered a more sophisticated and efficient means of cryptography and attracted militaries before and during world war II [USAI]. These protocols were esoteric to the people who invented them, and sufficient to serve a limited number of users due to the exorbitant cost of set up. Furthermore, these protocols may seem unduly pessimistic before the pre sence of cryptographic analysis and progression in mathematicians understanding of code breaking theories. For instance, a mathematical field of frequency analysis was introduced to find the substation operation, e.g., shift letters by two spaces in which most classical ciphers relied upon in one way or another. Definition 1.1 ( Frequency analysis ): is the study of varying frequencies of single or combinations of letters within a particular language (e.g., English; where the letter e occurs 12% vs. the L etter i 'which occurs 4% of the time in modern day literature (see appendix A)). Frequency analysis relies on measuring the frequency of each letter or combination of letters, on random plain messages and comparing the frequency with one found on the encr ypted message. In a mathematical representation, consider the list of alphabets ! and their corresponding frequencies ! ! obtained from a large plaintext file such that ! 100, for 1 (1.1) where is the count of occurrences of the letter Let denote the resulted set of pairs in the form ! Conducting the same frequency test (1.1) on a ciphertext file (encrypted file) shall yield a list of pairs in the form ! i.e., If both lists and are sorted based on the values of the index based
3 mapping 1 should at least construct partials of the plaintext and gives Eve, an adversarial id entity, sufficient capability to guess the complete contents of Proof sketch (Correctness of frequency analysis): Denote plaintext letters for which are fed into a black box model of some ciphering algorithm that consists of a set of operations such that ( ! In another words, each letter of the plaintext is mapped to the same ciphertext letter under operations of Hence, the equality ! holds true for any value of Moreover, the frequency of either or is taken from a subset of text of some language L (e.g., English). In terms of quantitative studies, the frequency of some element in proper size subset is asymptotic to frequency of the same element in the corresponding superset. Similarly, the frequency of some element in the superset is generalized to all subsets of the frequency of the same element of the superset. This explains the correlation between compute d from and computed from L subsets of L As a consequence of frequency analysis, militaries had to change their protocols for exchanging messages and reintroduced one that improved the current one so that it bec ame arduous to conduct frequency analysis by handwork alone (e.g., Caesar substituted multiple letters instead of one at a time). Nevertheless, cryptographers, with support of governments, have positioned enormous resources to break these protocols. More over, the developments of new technologies (e.g., computers) have emerged to ease the process and have placed more challenges on developing new secured cryptographic protocols. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 1 Index based mapping maps ! such that is the location of letter within the sorted list.
4 The mindset of cryptographers was focused on developing ways to merely hide me ssages while in transit. The development of technology increased cryptographic applications (e.g., electronic commerce, banking, and communication networks). This led cryptographers to expand the requirements of cryptographic protocols to not only hide m essages, but also to develop a way to authenticate the source of messages and ensure that the message had not been counterfeited while in transit. Furthermore, advances in cryptographic analysis broadened over time so that the range of their applications could be utilized by a large number of people exchanging messages. This stimulated cryptographers to find a key exchange methodology to work in conjunction with the cryptographic algorithm. The role of keys was to determine the functional output of the c ryptographic algorithm; it specified the transformation of an original message to an encrypted message and vice versa. Consequently, security of cryptographic algorithms should rely on the secrecy the algorithm's description but on the secrecy of the key The latter, in cryptographic literature, is known as Kerckhoffs' principle [Ker78]. Definition 1.2 ( Kerckhoff's principle ): It states that the description of cryptographic protocols should be a public knowledge and its security must rely solely on a s ecret key and randomization procedures. Kerckhoffs principle is an imperative rule in modern day cryptographic algorithms. The importance of this principle arises from the way modern cryptography is used. In terms of effectiveness, secured cryptographic algorithms are hard to design and are implemented either in hardware or software. This makes changing the algorithm much harder than was the case in the early stages of cryptography. Moreover, present day communication involves dynamic sets of communicat ion parties. If one
5 communication party grants the specification of the protocol, then he or she can conceal any message exchanged between the groups even if this party is not involved in future rounds of the communication. Since building cryptographic a lgorithms are expensive, it is infeasible to build one for every session of a communication. In terms of security, published algorithms get examined by professional paranoia [FSK12, p8 ]; cryptographers act as adversaries (Eves') to discover weaknesses of the algorithms before an actual Eve finds it. This alerts the reader to the fact that the more a particular cryptographic algorithm is exposed to the public, the more it can be trusted. The later statement is demonstrated by the choice of RSA as a standa rd public key cryptosystem on the majority of cryptographic applications over other competitive cryptosystems such as ElGamal. The use of cryptography has also expanded due to the increase of technological applications. Unlike before, where technology was limited to military uses, it is currently being used in a broad range of new technological applications including, but not limi ted to, e commerce finance, networking security, and password protection Hence, breakage of cryptographic protocols has deteriorative consequences that encourage cryptographers to always consider worst case scenarios in designing cryptographic protocols as well as accelerating the science of this field over the current state of technologies. The later scenario can be established by introducing quantum cryptography as a consequence of quantum mechanics implications in complexity theoretic cryptography, an d the breakage of current cryptographic protocol standard RSA [Sho97]. This has motivated researchers to design protocols that can be implemented in classical computers, and still confound quantum based adversaries (e.g., quantum resistant cryptography).
6 1 .2 Communication Channels and the Eavesdropper A communication channel is defined as a transmission medium between two or more entities. The medium can be classified as a wired or wireless transmitter. In all cases, it is a practical assumption that the se channels are unsecured and this is where cryptography becomes important. Secured communication channels are not assumed to only provide confidentiality over transmitted data, but must also provide data integrity, and authentication. Eavesdropper, Eve f or short, defined as an adversarial identity, is anticipated to have access to these channels and sufficient computation power to monitor every message exchanged back and forth between or among communicators. There are no particular guarantees or general assumptions related to Eve's behavior and resources. In view of this, worst case scenarios are always considered during the design and testing phase of every protocol. For instance, considerations include but are not limited to that Eve is capable to rea d, counterfeit messages and has an algorithmic and computing advantage over the publically known. Generally speaking, they're no access restrictions on communication channels and the only way to secure the transmitted data is through two processes at the end points of the communication channel: encryption and decryption i.e., data obtained from the com munication channel are obscured and useless to Eve. Moreover, the behavior and resources of Eve are unpredictable, which makes the field of computer security in general, and cryptography in particular, fast changing and poses compulsive concerns on assump tions regarding Eve's behaviors. Furthermore, Eve is unlikely to gain full information in one event and is likely to work to find small segments of information in order to work toward targeted keyed information that leads to breaching the security of a co mmunication channel. Such small and probably meaningless
7 information is the ciphertexts exchanged, the size of these ciphertexts, and time consumption by legitimate users to create these ciphertexts. Accordingly, cryptographic protocols must be designed following the "All or Nothing" principle; Eve must gain complete information before being able to breach the security [Riv97], otherwise, no information about the secret is revealed to Eve by knowing partial information of the secrets (e.g., the length of the secret). Assumptions can also be made that Eve' have the advantage of several years of research and have already developed advanced technologies that are unavailable to us. Thus, more challenges are posed to systems defender with the imbalance with s ystem attackers [FSK12, p8 ]. Consequently, exciting activities and research in the field of cryptography has increased its acceleration toward inventions of cryptographic protocols. These new developments have come with the assumption that new cryptograp hic protocols will resist against the expected advancement of technology of the next tens to hundreds years. In view of this, this thesis aims to study modern cryptographic designs and security assumptions besides the possible consequence of breaching the security of one protocol on other protocols thru the theoretical correlation between their designs on both classical and quantum computing. 1.3 Computational Complexity The foremost measures of modern cryptographic algorithms are based the intractability of computational problem that forms the basis of the algorithms design (e.g., the Integer Factorization problem IFP forms the security basis of the RSA cryptosystem ). Problems in computation complexity theory are classified into classes based on their di fficulty. Calling a problem difficult is an ambiguous term; thus, to understand the meaning of difficulty from the prospective of computing machines,
8 consider a problem that is solved by feeding it into a computing machine that applies algorithmic steps t o reach a solution. There are two attributes, time and storage which define whether a particular machine can efficiently give a solution to a computational problem. The fact that the machine is restricted to some bounds of time and storage is the intuiti on of modern cryptographic protocols; Eve must solve a problem that requires an enormous amount of time, storage or both to conceal the transmitted data. In contrast, legitimate users have a quantity, or a trapdoor element to the problem, which allows them to efficiently solve the problem in a reasonable amount of time and storage. 1.3.1 Classification of Computational Problems Computational problems can be viewed as a set of questions regarding an infinite set of instances ! in which a computer wi th an algorithmic implementation can map these instances to an answer represented by The mapping ! is dependent on the set of questions a computational problem may impose, and this mapping, is where the computational boundarie s (time and storage) discussed in section 1.3 play a part. Let us briefly define the major types of computational problems and subsequently define the way their complexity class is measured. Definition 1.3 ( A decision problem ): a problem which imposes th e computing machine to map an infinite set of instances ! That is, the solution to the problem is either yes or no denoted by 1 or 0 respectively. Example 1.1: Primality Testing given an integer ! decide if it divides any positi ve integer less than or equal to
9 Definition 1.4 ( Search problem ): a problem which imposes the computing machine to search a given instance for a set i.e., ! ! where elements of have certain properties defined by t he problem. Example 1.2: Integer Factorization given an integer ! search for a set i.e., ! ) Tj ET Q Q q 0 0.00001770258 612 792 re W n /Cs1 cs 00 0 sc q 0.24 0 0 0.24 247.0454 593.28 cm BT 50 0 0 50 0 0 Tm /TT1 1 Tf ( and Definition 1.5 ( Counting problem ): a problem with questions about the size of the solution set for a given search problem. Example 1.3: Number of integer factors given an integer ! return for a set i.e., ! and In regards to the search and deci sional problems, the following trivial theorem holds true in all cases. Corollary 3.1: for any hard decision problem, the corresponding translation into a search problem is at least as hard as its decision instance while the versa is not always true. This is true because searching for a solution to a given hard problem yields a witness certificate on the answer to a decisional problem. The reverse not always true as demonstrated in the primality testing of an integer, where certain properties, if held true a decisional answer maybe obtained in polynomial time while finding a divisor or factor remains a hard search problem. There are also an interesting problem classe in theoretical computer science that captures two major problems: the Integer Factorizati on and Graph Isomorphism problems. This problems is the basis of most quantum based algorithms and is defined as follows:
10 Definition 1.6 ( Hidden subgroup problem ): Let be a group, a subgroup of and be a finite set. The function hides if i.e., ! ! if The problem is to find a set of generators of the subgroup It should be noted that in later chapters of this thesis, several problems could be translated from one class to another. This translation may reduce the complexity of the problem as demonstrated by Shor's quantum polynomial algorithm for the Integer Factorization problem [Sho97]. The Integer Factorization problem, naturally a search problem, is reduced to a hidden subgroup probl em, which is the basis of Shor's algorithm to find non trivial factors of a given large composite integer. Shor's work has inspired the research community to exploit the principal of such reduction, and apply it toward several other problems, which include the Graph Isomorphism problem. Yet, no efficient algorithm could be found for the Graph Isomorphism problem ove r the symmetric group [EH99] a non abelian group (non commutative) In short, hidden variants of the subgroup problem form the ingredients of many efficient quantum algorithms. 1.3.2 Model of Complexity The common model used in computation complexity theory to classify the difficulty of problems is a deterministic Turing machine TM. It is a concrete model as TM is a mathematical model of general computing machines. The machine takes an input ! and sequentially applies several operations to move from one state to another in order to decide if for some language that is, outputs 1 if 0 otherwise. The language is the set of all strings that are acceptable by a given TM. F or an initial stage of defining the class of problems, we will consider the following fundamental definitions [Lewis Book]:
11 Definition 1.8 ( Decidable Problem ) : Let be a TM that accepts the language then, a problem is decidable or solvable, b y if for every input ! outputs 1 if ! 0 otherwise. In view of decidable problems, a machine imposes several operations moving from one state to another before it reaches a final stable state that defines the answer to a give n problem. The number of states, or distance from the initial to the final state, defines the number of steps it takes to solve a problem. The amount of the computing resources needed to solve a problem increases in parallel with the number of steps. He nce, the number of steps basis the following classifications of computational complexity. Definition 1.9 ( Class problems): The class of problem polynomially solvable problems, is the set of problems in which for every given input there exists a n algorithm to decide its membership in time bounded by a polynomial !"# Definition 1.10 (Class !" problems): The class of nondeterministic polynomially acceptable !" problems is the set of problems in which their membership can be verified in a polynomial time. In contrast to deterministic TM, Turing machine n TM may prescribe multiple actions in a given state. Let us say we have a computing machine to solve a particular problem given an integer as an input. If the problem is classified as then the machine can solve the problem deterministically and sequentially taking !"# steps where is a constant independent of the input In case the constant is dependent on then the number of steps required to solve the problem increases as the size of the input increase ( !"# ), which is the property of !" class problems [Car02].
12 Throughout this thesis, an efficient algorithm refers to an algorithm whose running time is bounded by a polynomially and can reach a soluti on in reasonable time and storage Similarly, a problem is feasibly solvable if there exist an efficient algorithm. In many cases, the term "practically feasible" refers to either a problem that has an efficient algorithm to solve it, or a non determinis tic polynomial problem running in non complicated inputs (e.g., small inputs) such that our computing device cannot exceed time and storage bounds. Other variants of complexity classes include !" complete and !" hard Definition 1.11 ( Class !" hard problems): is the set of decision problems that are believed to be at least as hard as the hardest problem in !" That is, an algorithm for solving !" hard can be translated to solve a problem in !" class. Definition 1.12 ( Class !" complete problems ) : !" complete is the intersection of !" and !" hard and contains the set of the decision problems. The verification of the solution of !" complete problem is in In addition, any problem in !" can be reduced to an instance of !" c omplete problem such that an answer to an !" problem is correct only if it matches the answer of the problem in !" complete. Discarding the time and storage constraints, all problems from and !" classes are solvable. Yet, not every solvable proble m is efficiently solvable in a reasonable time frame. Instances of problem such as factoring large composite integers are solvable, but in time expected to be a figure of thousands to millions of years with current state of technology. Readers should refe r to [Gol08] for in depth threads on computation complexity, in addition to storage complexity classes. 1.3.3 Theoretical Reductions of Complexity
13 In computational complexity, reduction refers to the transformation of one problem to another as a rule to s how two problems are no harder than each another. Intuitively, given two computational problems and we can state that is reducible to (i.e. if an algorithm is given to solve the problem can be used to solve the proble m efficiently. The algorithm for solving thus, can transform the complexity of to the same complexity level or a level of less complexity. This means, even after reduction, there might involve extra steps to completely solve the problems, but these steps are assumed to not change the complexity of the problem. That is, if a problem can be efficiently solved then using the same algorithm of we can efficiently solve the problem In some cases, the notation is written to denote that there is a reduction between the two problems but such reduction does not suffice to efficiently solve the other problem. That is, and are theoretically related, and one problem is reduced to a case of the other problem that happens to be as difficult. Table 1 : (A) Public (B) Private key Cryptography. (A) Alice !"#$%&'(& E ( M Bob's Public key) !"#$%&'%(' Bob D ( C Bob's private key) M (B) Alice !"#$%&'(& E ( M shared secret key R ) !"#$%&'%(' Bob D (C, R ) M 1.4 Public vs. Private Cryptography Current cryptographic protocols are classified into two categories; Private key (symmetric) and public key (asymmetric) cryptography. The classification is mainly based on the key form used in the protocol. In private key cryptography, only one key is used to convert the message into an unintelligible form and all parties involved in the
14 communication keep this key secret. The private key needs to be exchanged via a secured communication ch annel. Thus, a communication channel involving participants is a two party communication session that requires each communicator to maintain keys, for a total of keys for all communicators. Since both parties use the same key, there is no way to guarantee authenticity and the origin of exchanged messages. Private key cryptography serves a verity of applications such as files and passwords protection. However, a problem of key exchange arises when there many parties intend to elaborate in a communication such that none of them has met beforehand or a secure communication channel does not exist. RSA was the first practical protocol standard to solve this problem. The RSA algorithm is a form of public key cryptography and differs from pri vate key cryptography by using two keys; one is made public and the other is kept private to encrypt and decrypt respectively. Hence, a total of keys are generated such that each party generates and publishes a public key, which is used to encrypt mess ages by other parties destined to become key owners. It is also important to note that the public key scheme guarantees authenticity and origin since a key can belong to only one party involved in the communication. Table 1 illustrates the difference bet ween the two types of cryptography. In the given scenario, Alice sends a message to Bob and each encrypt and decrypt message via functions and respectively. The remainder of this paper considers public key cryptography due to its importance, popula rity of its applications, and its stimulating structure. 1.5 Mathematical Model of Public key Cryptography Consider the following realistic scenario, which applies to the majority of cryptographic applications. Communication parties Alice and Bob desire to exchange
15 messages back and forth via an untrusted public communication channel, with the primary goal tha t Eve cannot conceal or alter the content of the message. Alice will scramble the message into an unreadable form such that his friend Bob is the only one who can convert it back to its original form. This ensures that the message has been transmitted se curely and the copy obtained by Eve through the communication channel is obscure. Of course, both Alice and Bob need some mechanism (e.g., a digital signature) to guarantee that the message has indeed came from Alice. Finally, Alice and Bob want to be abl e to communicate at later time with other parties without the need to re generate keys and go through the communication set up again. That is, Bob and Alice, regardless of being legitimate users at one time point; they may not trust each other at a later t ime. The previous scenario provides the reader with the general framework for cryptographic communication in general and public key cryptography in particular. The set up of the communication and key exchange between Alice and Bob is through an unsecured channel. Current cryptographic protocols theorize the following functions through a mathematical model such that: 1. Encryption and decryption are constructed by one way trapdoor functions; 2. Digital Signatures are used to demonstrate the authenticity of the message. 1.5.1 One way Functions A one way function is a function that is easy to compute for every given input but computationally infeasible to reverse the computation. Thus, it is hard to reconstruct the input of the function given the out put and the function In another form, and ; the problem of computing is classified as problem, while reversing it is either !" !" complete, or !" hard. From cryptographic prospective,
16 one way functions simulate the process of signing the message for the purpose of message authentication. For example, given a function with input such that computing takes polynomial time, Conversely, it is practically infeasible to find given and i.e., A one way function promotes the concept of signatures under real life scenarios. A legitimate user, who intends to sign a message, should easily generate a signature but no one should be able to find the way it was signed, thus, no one can reverse the op eration used to sign the message. The seed, or input, to the signature function is the actual message to be signed in addition to a private key. Due to the fact that one way function is not invertible, while encryption and decryption processes are inversions of one another i.e., a new type of one way function can be introduced. Definition 1.13 ( One way trapdoor function ): A one way trapdoor function is constructed in such a way that it is easy to compute for any given input, but practical ly infeasible to construct given input from output with zero knowledge of the trapdoor (i.e., the trapdoor serves as a secret key to compute the function backward). Consider a function a secret quantity a public quantity and an input then is called a good candidate of cryptographic one way trapdoor function if (1.2a) (1.2b) (1.2c) (1.2d) i.e., (1.2a) and (1.2b) complexity are in while (1.2c) and (1.2d) complexity are in !" Public key cryptography is based on one way trapdoor functions that are currently
17 designed based on the difficulty of solving an intractable problem such as the IFP without the knowledge of trapdoor element of the function The public key defines an in stance of the function and is used to encrypt messages while a private key manifests the trapdoor to decrypt the message. Thus, keys define the permission rules of each communicator such that public key holders can encrypt, and verify the signature On the other hand, a private key holder can decrypt and sign messages. Based on this principle, each party generates a pair of keys where is made public and is private as part of the communication set up. Everyone can encrypt the message(s) u sing the recipient's public key, which is stored in an easily accessible shared point on the communication (the clouds), but only the owner of the private key can decrypt the message(s) in which the private key is kept secret by the recipient Perceptibly this scheme solves the issue of key exchange issue confronted in private key cryptography; not every two sides involved in communication need to go through the process of key agreement, which requires a prior agreement through a secured channel to establ ish secured communication. Let us now re define public key cryptography more precisely to illustrate the use of one way trapdoor functions. Definition 1.14 ( Public key cryptography ): Public key cryptosystems !" !"#$%& can be defined as follows: !" !"#$%& (M, C, P, R, K, E, D) where 1. M defines a plaintext space, and instances are denoted by m ; 2. C defines ciphertext space, and instances are denoted by ; 3. K is the key space, and instances are denoted by ;
18 4. is a public key space, and instances are denoted by ; 5. is a private key, and instances are denoted by ; 6. is the encryption function; 7. is the decryption function. It must be emphasized that (otherwise, it is considered private key cryptography) and and are family members of one way trapdoor functions such that i.e., computing requires a secret quantity to bypass the intractability of the inverse's computation Each insta nce of the plaintext space is mapped to an instance of the ciphertext using an encryption function and public key of the recipient by computing The decryption maps back the instances of ciphertext space to instances of plaintext space using the recipient private key ! Recall the definition of one way trapdoor functions; it is infeasible to invert them without knowledge of trapdoor information, in this case, the private key of recipient As to be demons trated in chapter 3, such designs involve algebraic group theories and they play a role on the security of cryptographic algorithms. One way trapdoor functions were first benchmarked in the design of cryptosystems in 1978 with the publication of RSA crypt osystem [RSA78]. The function of RSA relies on the intractability of the Integer Factorization problem (IFP). By way of illustration, assume RSA 768, a 768 bit RSA modulus with 232 digit decimal representation can be written as
19 12301866845 30117755130494958384962720772853569595334792197322452151 7264005072636575187452021997864693899564749427740638459251925573263 0345373154826850791702612214291346167042921431160222124047927473779 4080665351419597459856902143413 It is practically infeasible to factor while it takes polynomial time to multiply its factors and to produce (c.f., [Mon85] and [Kar63]), where and are both large prime numbers. However, knowing either or makes factoring a matter of dividing by the known fa ctor (trapdoor). There are in fact many other cryptographic systems based on the intractability of the integer factorization problem IFP. For example, algorithms such as RSA, Rabins Elliptic curve RSA, and Residuosiy based cryptosystems are all based on the intractability of IFP. It is also important to note that IFP is correlated to the discrete logarithm DLP and discrete logarithm problem over an Elliptic curve group ECDLP ( details in chapter 3). Thus, we can discard most of these cryptosyst ems once a practical integer factorization method is found. Nevertheless, the above number was factored as part of an RSA challenge involving teams from six institutions [Kle10]. In addition, new theoretical computing technology based on quantum mecha nical phenomena has evolved; consequently, it has been demonstrated that RSA can be broken in polynomial time [Sho97]. This upsurge in cryptographic analysis has motivated researchers to discover novel functions that can be implemented on modern computers, whose purpose is to confound the power of quantum computing. 1.5.2 Digital Signature Digital signature is a cryptographic scheme to gurantee the authenticity and integrity of sent messages. It enables the sender to attach a unique code derived from a
20 sec ret message to an encrypted message i.e., a code acts as an identification of a message. The digital signature function takes the message and private key as the parameter needed to sign a message. The verification process is analogous to digital signatur e except that it requires the corresponding public key of the signer of the message This mechanism guarantees the integrity and authenticity of the message, providing evidence to the recipient that the message is effectively coming from a given sender. The underlying construction of the digital signature mechanism is based on Hash functions. Hash functions take a complete message and produce a small, unique, fixed sized digested form of the original message, and are classified as one way function famili es [ZZW06]. Hence, it is impossible to derive the message from the digest but easy to verify the signature by computing the digest from the plaintext. In technical point of view, denote the Hash function by plaintext by and its corresponding ciphe rtext by The digital signature works in conjunction with the encryption decryption algorithm. The sender first calculates the digest from using the Hash function and the sender's private key ! The sender then encrypts the messa ge into ciphertext using the encryption function and recipient public key, The digest along with the ciphertext are embedded into and transmitted to the recipient The verification process is contrary to signing the messag e. The recipient takes and decrypts the message into plaintext then calculates the digest from to compare the embedded one to the calculated version If they match, the signature is verified and thus the message's source is guarantee d, and content has not been altered while in transit.
21 The combination of encryption, decryption and digital signature (Figure 1) ensures confidentiality and authentication. Moreover, it is reasonable to expect that a message digest is produced that inclu des but is not limited to, preserved message integrity and the production of a small digest is small. Thus, it can be inferred that Hashing algorithms are faster when compared to encryption decryption algorithms. It is further assumed that digital signat ure work because they are the only digital signatures created by appropriate private key decrypts and validate properly with the corresponding public key. Moreover, a small change in the original message changes the whole digest. This makes a single char acter change to the message result in a notable difference in the message digest. The later characteristic provides security from two points of view. It is hard to guess a digest of a message if a similar message with a corresponding digest is obtained b y Eve. In addition, changing the small segment of the message (i.e. negating a sentence) cannot be bypassed without being noticed by the recipient of the message. Figure 1: The flow of encrypting and signing a message with signature verification on the recipient side. Message CipherTex t Signatur e Signature | CipherText Message Signature' Sender Recipien t
22 1.6 Focus Thus far, this thesis has been directed toward introducing and analyzing modern cryptographic protocols. This choice has influenced the scope of this thesis in the following ways: This thesis reviews the design principles of modern cryptography with focus on cryptographic primitives in terms of security; This thesis studies theoretical attacks on the underlying basis of modern cryptographic proto cols; the intractability of Integer Factorization IFP, Discrete Logarithm DLP, Elliptic Discrete Logarithm ECDLP, Quadratic residue QRP, and Diffie Hellman DHP, and other extensions of these problems on classical computers; This thesis ascertains the theor etical correlation between IFP, QRP, DLP, ECDLP, DHP problems such that efficient solution to one of these problems yields a solution to the other problems, thus, breaking the cryptosystem built based on their intractability; This thesis briefly discusses the implication of quantum computing on IFP, DLP and ECDLP based cryptography and why we should discard these protocols if quantum computing comes in practice; This thesis overviews candidate quantum resistant one way functions based on graph theory.
23 The area of quantum resistant cryptography has been the focus of recent literature. Shor's algorithm, a polynomial time integer factorization algorithm based on quantum computing, has perpetuated the need to work on cryptographic protocols that are based on pr oblems other than IFP, DLP and etc (i.e., problems that cannot be solved efficiently in polynomial time using the power of quantum computing). The challenge takes place when quantum computing does not exist. In this thesis, I plan to consider quantum com puting power as a measure of security, an upper bound of all possible computing that might be available to Eve, and toward designing a protocol that can be efficiently implemented on current computers, but still confound quantum computing technologies. Th e candidate quantum resistant intractable problem is based on the graph isomorphism problem in conjunction with Hamilton cycles. This work, mainly in Chapter 4, is inspired by the findings of quantum difficulty in solving graph isomorphism problem over a symmetric group in [MRV08] and proof of its ability to even circumvent quantum computing power has been discussed in [KK07, CW05]. The remainder of this thesis is arranged as follows: Chapter 2 presents the classical cryptographic primitives and typ es of mathematical functions used to build confidentiality and authentication boundaries and discusses ideal properties and design principles associated with these functions. These primitives are common on classical cryptographic algorithms but conceptuall y play a role on modern cryptographic algorithms. Chapter 3 presents classical and briefly quantum based attacks and security analysis of today's leading public key protocols that are based on the IFP, DLP, and ECDLP and correlated computational problems.
24 Chapter 4 confers the result of [MRV08] by investigating the possibility of introducing a cryptographic primitives based on the Graph Isomorphism problem in conjunction with the Hamiltonian path problem. ! !
25 2. Cryptographic Primitives This chapter describes the building blocks of a cryptographic system and extends the discussion given in chapter 1 on one way functions used to implement the majority of cryptographic routines. These routines include Hash functions, block ciphers and random bit generators. Design principles and standard properties are discussed in the context of achieving the main cryptographic goals, which include authentication, integri ty, and confidentiality. 2.1 An Overview of Hash Functions The basis of cryptography is mapping a given readable input to an obscure version of the input. The most common mathematical technique is based on Hash functions, a member of one way function fam ily introduced in Chapter 1. Hash functions originated from the field of computer science and have broad range of applications such as file compressions, data caches, search engines and bloom filters 2 Hash functions have also been adopted in the field of cryptography, and the output of Hash functions has been given a verity of names in the cryptographic literature, commonly: hashcode, message digest, authenticator, seal and finger print [Pre03]. The first deployment of Hash functions into public key crypt ography was demonstrated in [DH76]. However, Hash functions were well known as the basis of private key cryptographic algorithms, and such functions are MD4 and MD5 that were employed into practical software [Riv92]. Not too long after MD4 hash function wa s standardized, MD4 was then weakened as a consequence of Dobbertin's paper [Dob98], where Dobbertin spotted a collision in the MD4 Hash function. The cryptographic Hash !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 2 Bloom filter is a probabilistic test on an element's membership to a given set.
26 functions candidates are supposed to satisfy certain properties to be considered secu red, particularly, must be collision resistant [Mer79]. Definition 2.1 ( Collision resistant hash function ): I s a one way function ! such that ! and It also exhibits the following properties: 1. must be elastic whereas is a fixed number; 2. complexity class in ; 3. ! such computation complexity is 4. then !" ! for ! and ; 5. !" ! !"# ! i.e., !"# ; 6. The secrecy of must not depend on the secrecy of its description, and its description must be publically known ( Kerckhoffs' principle ). Remark: Collision resistant (5) implies second preimage resistant (4) while it does not guarantee preimage resistant (3) [RS04]. The p roperty (3), preimage resistant, implies that the function is hard to reverse; computing the revers of is an !" class problem, a non polynomial problem or harder. In addition, properties (4) second preimage resistant and (5) collision resistant in dicate that the function is collision resistant such that no two inputs can easily be mapped to the same digest output. If the inputs exist, then it should be hard to find them and any random attempt of finding such values should be with probability co nverging to 0. The probabilistic assumption is due to the fact no one knows whether one way functions exists, as their existence will collapse the complexity classes such that !" [Gol98]. Thus, Hash functions are based on strong assumptions and henc e, we only have
27 "candidate" functions. This fact has influenced our presentation of the above properties thru the consideration of one wayness existence as probabilistic, and convergence to 0. The terms of collision resistant and collision free both refe r to the same concept and are one way functions under certain conditions [Gib91]. Characteristically, cryptographic Hash functions must produce a diffused and confusing digest; a digest should in no way be similar to the actual input, typically a message, including a partial match between the message and the digest, and the message cannot be reliably guessed based on its digest. Likewise, cryptographic Hash functions must satisfy the avalanche property [Hor73]; bit change on the message differs approximat ely 50% from that of the digest. The length of digest must also be large enough to protect against random attacks, where Eve tries to select a random message with the hope that such change passes undetected. The probability of such success is given by !"# Failure to satisfy one of these properties is called certification weakness and does not mean the Hash function is broken, but provides enough reasons to be concerned about its underlying design [Mir05]. 2.1.1 Merkle DamgÂŒrd Construction The Merkle DamgÂŒrd Hash function construction technique was first described in [Mer79] and farther developed in [Dam89] and it has been widely adopted into known Hash functions including MD4 [Riv91, RFC1320 ], MD5 [Riv92, RFC1321 ], SHA [SHA08, FIPS180 ] family d esigns, and numerous other known Hash functions. Merkle DamgÂŒrd constructs CRHF iteratively using a compression function on blocks of data of an arbitrary size.
28 Definition 2.2 ( Compression function ): The compression function is a verity of one way functio n s which maps a given input to a unique and shorted output For example, ! (2.1) where in (2.1) is a function that accepts only data of size The aim of Merkle DamgÂŒrd construction is to allow an arbitrary size of input and ensures the compression function still retains the property of collision resistant. The underlying structure of this construction is that transformation using a resistant collision compression function with an appropriate padding scheme results in a colli sion resistant function [Dam89]. Remark: Compression functions do not refer to functions used to compress data, as these functions are opposed to other types of compression functions by the assumption that is a non invertible function. Figure 2: Merkle Damgard construction of collision free Hash function using compression function a ! where its input in round is the output of the previous round in addition to the data block. Designing a collision resistant Hash function requires finding a sufficient candidate compression function and embedding the process of Merkle DamgÂŒrd to ensure it is difficult for data to collide. Since compression functions only accept a static input size, Merkle DamgÂŒrd suggest dividing input data into blocks of fixed size which is described as follows !" ! ! !
29 ! i.e., for (2.2) and iteratively applies the compression function to 's blocks (2.2). The function takes a data block along with the output of the previous round ! to produce a new block until it covers all data blocks. The Merkle DamgÂŒrd design takes an additional parameter !" which denotes to a fi xed constant initialization vector and uses a parameter throughout the first round of Merkle DamgÂŒrd construction. An illustration of the algorithm is given in Figure 2. Typically, the padding technique in the last round is necessary to ensure that the fu ll data length is a multiple of the accepted size by the compression function. MD compliant padding had been carefully recommended while there exists a verity of good padding techniques, as proposed in [Sar09, Nan09, Yas08]. MD compliant padding, as o utlined in [GB08], is defined as follows: Definition 2.3 ( MD Compliant padding ): MD Compliant padding is described by a padding function where and are input and output spaces respectively. Consider the instances and of then the padding is called MD compliant if it satisfies the following properties: 1. is a prefix of ; 2. If ! then ! ! ; 3. If then the last block of is different from the last block of The underlyin g assumption of recommending MD compliant padding is that it ensures the MD Hash function is collision free if the compression function being used in Merkle DamgÂŒrd construction is also collision free. This can be well maintained from properties (2) and (3 ) in which the result of given two different data blocks results into
30 two different padded data blocks. By way of illustration, consider !"!!! and !"! i.e., for a 6 bit MD function. The result and cannot pad 1 and 111 respectively. Otherwise, the padding has introduced a collision by way of converting two different inputs to the same value, in which both are then inputted to a one way function and mapped to the same digest (properties (2) and (3)). In addition, property (1) ensures that no change on a given input of is done except in embedding bits to the input to increase the size to one accepted by the compression function 2.1.2 Generic Attacks on Hash Functions Hash functions, regardless of their descriptions are vulnerable to generic attacks as opposed to specific attacks, which exploits a weakness in their design principles. This type of attack is dependent on the size of the message digest and secret quantity. Roughly speaking, ideal Hash functions shoul d only be vulnerable to generic attacks and should be used for a reasonable message digest size and a large random key such that generic attacks are practically infeasible using existing or future computing power. Generic attacks can be categorized into t wo attack types: (1) attacks against the collision free property and (2) attacks against the one wayness property. The number of calls to the Hash function, which are treated as a black box model, measures the complexity of these attacks. In the case of the collision free attack, an attacker aims to find two messages and that map to the same Hashcode while neither nor is known to Eve. This attack requires a brute force to search !"# possibilities of messages before an attack succeeds, that is, finds two messages that maps to the same digest [Mir05]. In contrast, attack on one wayness property, or preimage, allows an attacker to find a message that maps to a Hashocde known by the attacker. Similarly,
31 attack o n second preimage property, allows an attacker who has a message to find a second message that has the same Hashcode. The latter two requires fewer than !"# attempts [Mir05]. Successful attacks of these categories contrast depending on the typ e of application. Since the main approach is a brute force, or exhaustive search, and requires random attempts, some applications might only allow one chance to attempt calling the Hash function. A successful attack has a probability of !"# while i f an attacker can carry out attacks, the probability of success is !"# such that assuming the Hash function is treated as a black box model 3 220.127.116.11 The Birthday Attack Birthday attack is the most commonly used brute force attack technique with the success of attacks determined by finding the probability of finding a collision of a given Hash function. The birthday attack, in terms of Hash functions, corresponds to a Hash function collision. Intuitively, an attacker uses this concept by generating two sets of messages and [Mir05]. The probability of finding an instant A and B such that and map to the same digest and collide on some Hash function is given by !"# (2.3) which is about 63% when ! [PGV92]. The denominator of (2.3) is the total number of messages that can be chosen for either sets or Hence, increasing the size of domain, or the group of elements, that are inputs to the Hash function !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 3 Black box model is defined as an object that takes input and produces output without any knowledge of its internal transfer characteristics.
32 increases the difficulty of birthday attacks. Further developments have been proposed to increase the efficiency of birthday attacks including [Ber07, Kir11]. The conclusion of multiple literature based on cryptographic research, is that if the collision free property has to hold, should be at least 128 bits a nd should double the key length. Remark: One can visualize the problem with one set set up as follow. Let such that is a set of all pos sible digests 's resulted from a given hash function That is, !"# and be a subset of The aim is to compute the probability of having repeated, collided, digests. If we start from the step in which we choose the elements of after choosing the first element, then we have possible digests and a unique one is chosen with probability ! After the second choice, we have possible unique elements with probability ! and so on. Let us say we choos e elements, that is, then the probability of having the unique elements from is the multiplication of all the probabilities such that ! ! and the probability of having repeated elements in hence, two digests collide is ! ! 18.104.22.168 The Length Extension Attack It is common with Hash functions built on the Merkle Damgard construction techniques such as MD5 and SHA 1. The vulnerability works if the Hash function allows an inclusion of information such as the message and length of the secret. It works as follow: the Hash function, as discussed above, operates on a fixed size input and splits
33 the data into blocks and pads the last block if the data is larger than the Has h function designed to accept. Once the Hash function operates on all data blocks, it outputs the digest, which is considered as an internal state of the Hash function. This implies that we can continue the Hash operation even with zero knowledge of the input data. Thus, Hash functions truly operate not only on a data block but also on a padding string (i.e., ). If we could determine the length of the data then we can determine the length of which is !" | for some integer and is the length accepted by the Hash function. That is, the padding is just extra bits needed to make the size of input to a multiple of The value of can be determined by exhaustively searching all !"# possibilities a successful attack i nvolves feeding the Hash function with ! where is the attacker's data and is the padding of the extension if needed. Comparing both result of and ! can leak information about the keys, and extracts a valid dig est of which then can be used as a signature of another message. In addition, if we can design another Hash function that has a starting state then we can create a digest for any message which outputs a digest for messages of the form ! without the need to know the secret s. Other techniques can be used to fool webservers into calculating the digest of messages provided by Eve without the need to known the secret keys of the Hash function, as demonstrated by Thai Duong and Ju liano Rizoo in 2009 against Flicker API [DR2009]. This gives a generic description of the attack and how such an attack can be exploited in slightly different ways depending on the way an application using the Hash function works.
3 4 2.1.3 Hash Function Based Authentication Hash functions are the core of digital signatures, a widely used methodology in public key cryptography, used to authenticate messages. The message authentication process guarantees the following: 1. The source of the message is from a known source; 2. The message has not been counterfeited while in transit. The sender takes his private key and message to compute the digest and embeds the message with the encrypted message The recipient decrypts the message, computes the dig est using the sender's public key, and compares it to the embedded one. If they match, the message is verified; otherwise, the message has been either modified or is sent from Eve. By way of illustration, we can now show the importance of Hash function p roperties discussed in section 2.1, and discuss the essence of avoiding collision. Consider the following scenario; Alice sends the following message to Bob over an unsecured communication channel: "Transfer $100 to Steve", before that, Alice computes the digest as "Transfer $100 to Steve" and encrypts into then embeds into i.e., the transferred packet contains Eve intercepts the message and can breach the security of this communication thru one of the following cases: 1. If is not collision free, Eve might find another message which Hashes to such that Hence, the message is replaced by the counterfeited message such as ("Transfer $1K to Eve"), keeping the digest unchanged;
35 2. If is not preimage resista nt, Eve can intercept the transferred and obtain a copy of Since is publicly known, it would be possible to disclose using and If this is assumed, it follows that Eve can, without being detected, sign counterfeited messages in the future and conceal past and future messages; 3. If is not second preimage resistant, Eve will search for any random message that Hashes to and replace the message sent to Bob by this message. In practice, preimage attacks are used frequently by Eve as he/she aims to adverse the communication and read secured transferred data. Secondly, the second preimage attack is used to deny the usefulness of such communication. On the other hand, the collision attacks are not practical even if collision can happen, perpetuating the message that Hashes to a given digest is much harder than finding the collision. 2.2 Block Ciphers A block cipher is a one to one function that takes two parameters: a bit key and bit plaintext message The output of is a bit ciphertext that holds the property of diffusion and confusion i.e. is in no way similar to and cannot be traced back to find if Eve holds a copy of The function can be used to encrypt message into a ciphertext as foll ows: The function has an inverse which can transform back to its original form : !
36 The function is publically known and easily computable to anyone holding the key parameter with the assumption that Eve does not hold Therefore, Eve cannot decrypt the message. The key is generated at random and securely exchanged between legitimate communication parties. Hence, the Eve's first goal is to recover and the design of block ciphers should make this goal com putationally infeasible. Block ciphers are more commonly used in private key cryptography, where two parties securely agree and share a secret key. However, public key cryptography in a sense is a block encryption and block ciphers are used as tools rath er than a public key scheme. For example, RSA commonly encrypt blocks of data of as many bits as the size of the modulus, which can be bits of size 1024, 2048, or 4096 depending on the key size. Block ciphers have received considerably more attention fro m the cryptography research community than stream ciphers, which ciphers data bit by bit. This is because stream ciphers (e.g. RC4, A5/1) are more suitable to bandwidth limited applications (e.g., GSM) [CBP06] and current cryptographic applications requir e securing large data sets which makes an overhead computation by encrypting one bit at a time. 2.2.1 Modes of Operation Block ciphers accept inputs in a fixed size (e.g., 1024 bits in 1024 RSA). Modes of operation describe the way to handle data that is larger than the designed block cipher accepts, for example, 2000 bits data for 1024 RSA is divided into two blocks of size 1024 and 976 where is padded bits t o make the second block equal 1024 bits. Dividing data to be transmitted over an unsecured channel is essential in public key cryptography. If RSA was designed to accept an arbitrary length of data, then it could be fed very large data, causing computa tion overhead, the use of large amounts of CPU,
37 which is undesirable in many computational environments including webservers. Thus, dividing the data into blocks speeds up the processes of cryptographic protocols and provides a level of security (refer to section 2.2.1). Generally speaking, the ideal mode of operation should provide the following noticeable advantages: 1. The ability to differentiate ciphertexts resulting from encrypting the same plaintext, which protects against frequency analysis attacks. This is achieved through the process of mixing a ciphertext of one block with the succeeding plaintext data block; 2. The ability to support speeding up the process of encryption and reducing the CPU cycles needed to handle large data sets. Mostly, using smaller data blocks allow us to use smaller group size (plaintext and ciphertext spaces); 3. Making it impossible to perform packet replay attack 4 This is due to the fact that ciphertext is now a mixture of multiple data blocks in which each is intersected with the other. Simple change of the ciphertext will eventually cause the plaintext as a whole to change since each bit of a data block is an input to the encryption process of the succeeding block; 4. The ability to add more confusion and diffusion to the c iphertext from the process of incorporating all plaintext blocks into the final ciphertext thru multiple rounds of encryption. Different modes have been proposed in the cryptographic literature and the NIST recommendation includes Electronic codebook (ECB ), Cipher block chaining (CBC), Counter (CTR), Cipher feedback (CFB), and output feedback (OFB) [MD01]. The !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 4 Packet replay is a form of at tack in which data transmitted is maliciously repeated, interchanged or deleted while in transit.
38 security of these modes of operation is dependent on the underlying block cipher as well as the way the modes processes the data for encryption. Par ticularly, each message is divide into blocks of data, and the last block is padded to ensure its size is acceptable by the underlying Hash function. The ECB mode processes the data blocks independently and one at a time. This can increases the vulnerabili ty of frequency analysis based attacks. However, it still be useful on certain situation in which the computing power, memory, transmission mediums are limited, commonly adopted with stream ciphers. Other modes CBC, CTR, CFB and OFB are similar to each oth er but all follow the technique of mixing the ciphertext results from one block with the encryption of the succeeding data block to ensure a "confusing" mixture and resists frequency analysis like attacks. 2.2.2 Block Ciphers Cryptanalysis Cryptanalysis is an essential stage in the design of block ciphers; because it aims to find the potential weakness before Eve can find them when block ciphers are integrated in real life applications. Although cryptographers have found novel techniques to evaluate blo ck ciphers, cryptanalysis is still a fast moving field, and no one technique can guarantee security of a block cipher. Moreover, secured block cipher is granted to block ciphers whose potential weakness can be exploited with no less complexity than brute force attacks [Sch00]. Given this point, a block cipher resisting at most !"# brute attempts, can handle an attack with !"# easier. Cryptanalysis thus concentrates on finding shortcuts, utilizing the structure of a block cipher and exchanged data (i.e. ciphertexts).
39 22.214.171.124 Attack Scenarios It is generally accepted that Eve has a full knowledge of the block cipher description, which follows the principle of Kerchkoff to be utilized. Admittedly, Eve's task is to recover parts of the plaintext or bette r yet the key being used to produce ciphertexts. Bypassing the limitation of knowledge known to Eve, leads to different attack scenarios. Attack scenarios are classified based on what information, other than the description of the block cipher, Eve can o btain and to what extent Eve can cause communication interference using this information. Ciphertext only. Eve can obtain the ciphertext thru unsecured communication channel, and with the knowledge of Eve must solve But, Computing is hard and requires enormous computation power and time without the trapdoor information, equivalently, the secret key Thus, depending on the description of and code of operation used, Eve best start is to conduct frequency analysis on to recover Known plaintext. Eve obtains a pair of plaintext and the corresponding ciphertext and knows and used to generate them. The aim of Eve is to recover the secret key to be able to obtain every other pair of plaintexts and ciphertexts. It obviously should be hard for Eve to obtain a pair however, Eve may intelligently guess messages from the context of the communication as Eve easily collects ciphertexts from the communication channel (e.g., automatic email responses). Chosen plaintext. Eve is capable to treat the cryptographic functions and as a block box model and feed it with message to output the corresponding ciphertext. Eve then collects a set of the form The best that can be done with these pairs is to conduct a differential cryptanalysis, investigate on the difference between ciphertexts as
40 the message changes to establish non randomized behaviors and the key (details in the next sectio n). This practically possible if the block cipher is implemented in the hardware with no access rules, or even at the software layer. Chosen ciphertext. Is similar to the chosen plaintext technique, but for this case, Eve has a way to trick the decryption controls by giving it a ciphertext of his or her choice to obtain the corresponding plaintext, similar to the length extension attacks described in the previous section. Adoptive chosen plaintext or ciphertext. Is used to mount one of the above attacks by obtaining the encryption or decryption key from a series of data blocks. In this technique, the choice of each successor block is dependent on the result of the previous one. 126.96.36.199 Differential Cryptanalysis Differential cryptanalysis is methodology of a chosen plaintext attack, which was first made public in 1990 by Eli Biham and Adi Shamir to present the scenario where there is an attack on DES like cryptosystems [BS91]. Although the introduction of differential cryptanalysis was established for ta rgeting DES and iterative symmetric block ciphers, it has been applied to numerous block ciphers, and is now considered to be a strong cryptanalysis technique when applied to all block ciphers. In the following discussion, we give a generic description of cryptanalysis assuming the block cipher is black box model, which takes inputs, with the result output both in the language for some fixed integer In the broadcast sense, differential cryptanalysis studies how output of a block cipher diffe rs by changing the input, in the same sense as the adoptive chosen plaintext
41 attack. It traces the differences in aims to discover where the block cipher exhibits systematic behaviors and exploit such behaviors to recover the secret key. Several differen tiations narrow the probability of what the key might be until it is eventually recovered. The fundamental concept in differential cryptanalysis is measuring the difference between two values. The difference is calculated by XORing two particular values and is often denoted by The probability of differencing an input may give raise to differential output, and is often referred to as characteristic of the output [Swe12]. Definition 2.3 ( Characteristic ): A characteristic is composed of two differentials described by and The differential in an input gives rise to a differential in output Symbolically In most cases, the characteristic does not give certainty. Hence, differential cryptanalysis speaks to the probability that the differential in an input will result in differential in the output. Consider a random block cipher with bit input and bit output Let and be two inputs and and be their corresponding outputs. The input difference is given by ! In a similar manner, the output difference is given by !
42 Typically, for any given bit block cipher, the probability for a given input difference is Yet, the correlation between and can gi ves clues about the key and non random structure of the block cipher, raising the probability of what key is being used. 188.8.131.52 Linear Cryptanalysis Linear cryptanalysis is a methodology of establishing a probabilistic linear relationship between know n plaintexts with their corresponding ciphertext in order to solve for an unknown variable, commonly the secret key. Linear cryptanalysis is similar to the differential cryptanalysis in the sense that both techniques don't give certainty, but instead give statistical probabilities that can eventually give a clue about the key. By way of illustration, linear equations obtained from known plaintexts (Eve knows and by default knows ) can be symbolized as follows: ! ! ! ! ! ! (2.4) This equation can be solved to recover the key bits when plaintext ciphertext pairs are known. It is important to note that; equation (2.4) considers an encryption scheme similar to one time pad where plaintext is X ORed with the key to produce a cipher text ( ) which can be rewritten as ( ) Depending on the encryption scheme, one can construct the linear relationship between plaintext, ciphertext and key bits to recover the key bits when p laintext ciphertexts are known. In modern cryptographic cryptosystems, linear encryptions is avoided and mostly rely on encrypting blocks of data represented in numerical formats to map them to another values.
43 2.5 Random Bits Generators Cryptographic sys tems should be built in a way that exhibits pattern and frequency analysis. However, these strong algorithms involve secret quantities (i.e., keys and in some cases an initialization vector !" ) to be generated at random. Hence, Eve is assumed to have no capability of predicting generated values even if all possible information has been obtained. That is to say, a security criterion of the cryptographic algorithm includes testing its randomness in terms of values generated and non predictable behaviors. Hence, predictable randomness generated by ineffective random number generators may enable Eve to make guesses on the key necessary to build randomness into the attack environment and compromise the keys. Recall the following cryptographic assumption: if vulnerability against a cryptographic algorithm with complexity less than brute force, the algorithm is considered weak [ Sch00] However, the later assumption does not always hold true on modern designs as will be demonstrated in Chapter 3. True randomness must retain the probability of brute force attacks at a constant value for an bit string This implies that, Eve has no shortcuts on all possible values of string since there are exactly bit strings. If the probability is less tha n then an attack with complexity less than a brute force attack that yields breakability raises concerns on either the randomness or encryption scheme. In addition, this probability is fixed even if for the case when Eve obtains different values du ring different rounds of the algorithm. If Eve gets values, and no indication of subsequent values, either by pattern or statistical analysis to narrow down the possibilities, there should be low probability of guessing or determining the key.
44 In literature, random number generators are classified into two classes; true random number generators (TRNG) or pseudo random number generators (PRNG) [EF12]. The difference is that TRNG rely on an unpredictable physical source that is outside human's contr ol while PRNGs depend on the initial state of the number seed, which include random seed and uses numeric or logical operations to produce randomness. Thus, the random outputs become completely predictable once this state is known. However, PRNGs are sti ll more practical for the following reasons: they are much easier to construct in digital hardware, process faster, and are based on unpredictable values as long as the internal state is unknown [EF12]. The current standards from NIST recommends several P RNGs while no TRNG generators are found in the list [RF12]. A typical PRNG is known as a linear congruence pseudo random number generator (LCG). The recursion relation defines the LCG as ! ! ! ! ! ! !"# f or ! and ! (2.6) The selection of the seed multiplier shift and modulus in equation (2.6) drastically affects the statistical properties [CCS94]. Remark. Random number generators can be theoretically tested and experimentally confirmed using adaptive chi square [RSS04] besides other known tests such as Kolmogorove Smirnov and Serial correlation Yet, passing these tests does not guarantee that the number s are truly randomized, on the other hand, failing these tests doubt us about the generator being a truly randomized.
45 3. The Underlying Intractabilities of Modern Cryptosystems In this chapter, a discussion on the underlying intractable problems which builds the basis of modern cryptosystems security, will be given This discussion focuses more on the hardness of these problems, and thus the security of the most common cryptosystems rather than their correctness. Intractable problems are integrated in the design of modern cryptosystems such that only legitimate users having a secured quantity can solve the problem in time bounded to a polynomial !"# for some constant independent of the input Despite that there are many examples of wel l secured cryptosystems, our discussion will focus on cryptosystems based on the Integer Factorization IFP, Discrete L ogarithm DL P, and the Elliptic Curve L ogarithm ECLP 3.1 The Integer Factorization Problem (IFP) The IFP is considered one of the most intractable problems in both computer science and mathematics, and has been intensively studied by the research community in both computer science and mathematics It is the basis of the most powerful cryptosystem RSA, and is considered the first intractable problem adopted into the designs of cryptosystems. Definition 3.1 ( Integer Factorization Problem IFP ): G iven a large composite number the problem is to find its constituent prime factors i.e., ! The di fficulty of the IFP increases rapidly as the number to be factored increases due to the amount of computational power that is required to solve the problem. Conversely, it takes polynomial time to multiply a set of prime numbers ! to produce a large composite number Thus, it is a problem that has been well adopted to
46 serve as the basis of cryptographic Hash functions, 5 and it is the basis of several cryptosystems such as RSA, and Rabin's cryptosystems. Consequently, breaking one of these cr yptosystems by finding a way to efficiently factor a large composite number will eliminate all IFP based cryptosystems. The IFP has two forms, each based on the decomposed non trivial divisors of an integer i.e., multiplying the non trivial divisors t ogether produce These divisors can be either composite or prime numbers. In the case of composite divisors, searching for all other divisors is relatively easy once one divisor is found Dividing by one of these divisors reduces the number and more divisions can further reduce it. Hence, you are reducing the power of computation necessarily to factor it. Therefore, the smaller the number of divisors within the range the more computing power and operation is needed to reduce He nce, one can compose with only two factors by multiplying in which and are prime numbers; non reducible factors that can only be divided by 1 and itself. This establishes the importance of choosing prime numbers to constitute a large compo site number, and since these numbers are expected to be large, a certificate of their primality is an essence (details in section 3.1.2). Factorization algorithms are classified into two categories, mainly based on running time dependency. The first category is the general purpose factoring algorithms in which their complexity depends on the size of n the number to be factored. The second class is special purpose factoring algorithms in which their complexity depends on the size of the factors and of [Yan13]. Table 2 lists examples of factorization algorithms, in which the number field sieve algorithm is considered the fastest and !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 5 Cryptographic Hash functions are often used in the context of this paper to refer to a one way trapdoor Hash function.
47 particularized in factoring several of actual RSA keys listed as a challenge by RSA laboratories [RSAlabs]. Table 2: Common factorization algorithms with their corresponding running time complexity Special General Trial Division !"# Number Field Sieve !"# !" !"#$ !"# !"#$ Pollard's "rho" !"# Multiple polynomial QS !" !"# !"#$ !"#$ 3.1.1 Complexity Class of the IFP In terms of complexity classification, most cryptographic literatures believe that the IFP lies in !" complexity class. The IFP cannot be !" complete problem due to the fact that it is a decision problem, reformulated as whether n have a prime factor p that can be solved and verified efficiently (see details below on primality certificates). In addition, no one knows if it is !" hard. In fact, no one has proven if IFP cannot be solved in algorithm running in time bounded to a polynomial with respect to the length of the input. 3.1.2 Primality Certificates In section 3.1, we have seen that IFP is hard only when a given large composite numbers are factored to only prime numbers. Thus, building an instance of IFP requires generating large prime numbers. Since prime numbers are randomly distributed, and are believed to be infinite, it is essential to ensure the generated random numbers are actually primes. If a cert ificate on the primality of a given number cannot be trusted then it is possible to impose a serious threat on the communication session as the composite number becomes easily factorable.
48 Definition 3.2 ( Primality certificate ): is an assertion and proof of a given number being prime Primality testing varies in the nature of the proof; some are probabilistic and others are deterministic In general probabilistic tests result in some degree of certainty that the number is a composite i.e. there is a h igh probability that is a prime. The extraordinary discoveries of primality testing algorithms running in polynomial time are the primary reason IFP based cryptosystems (i.e., RSA) are practical and can be trusted Generally speaking, IFP based crypto systems start with randomly generated large numbers and searching for the nearest prime number, to constitute the encryption decryption keys; hence, primality testing comes in an essence in the key generation process. The use of a probabilistic approach is primarily used because it is convenient due to its efficiency (polynomial time), which was not the case for the deterministic algorithms at least before the publication of AKS algorithm in 2004 [AKS04]. AKS algorithm does indeed prove that primality tes ting problem is in class of complexity, however, it is still not practical and fails to compete with non deterministic algorithms in terms of efficiency in practice (refer to section 184.108.40.206). 220.127.116.11 IFP vs. Primality Testing The nature structure o f the IFP is classified as a search problem, while primality test is a decision problem. In many cases, both problems are associated together in many applications such as cryptographic protocols. However, they are algorithmically different. Primality tes ting exposes the given integer into congruency equalities and properties that are believed to satisfy either composite or prime numbers. By way of illustration, one can demonstrate the compositeness of a given integer by examining the
49 last digit of the num ber. For instance, a number ending with 0, 2, 4, 6, or 8 indicates that the number is even and thus, divisible by 2. In addition, numbers ending with 3, 5, or 7 indicates the number is divisible by 3, 5, or 7 respectively. This simple test certifies the compositeness of a given number with zero knowledge of its constituting factors. In a similar manner, non deterministic algorithms test a given number against congruencies that resulted from number theory discoveries, in which they are believed to be true for only prime or composite numbers. Remark: Strong primality tests (ones in which the certificate can be trusted) provide more confidence on the hardness of solving IFP, thus, IFP based cryptosystems are more secured. 18.104.22.168 Fermat's Primality Test F ermat's primality test is a probabilistic test based on a simple result from number theory known as Fermat's little theorem Theorem 3.1: G iven a prime integer then for every positive integer the number ! if otherwise, ! and has a reminder equals to 1. Symbolically, !"# (3.1) Remark: The algorithm is probabilistic and tests the primality of by choosing as many a's as possible. The accuracy of the algorithm is mainly dependent on the number of times the test is invoked with different values of 's. Proof: Let be a set of numbers that are coprime to some prime number ! forms a multiplicative group modulo Based on the fact that
50 is a coprime of !"# then for every we have !"# and for any positive integer and since is closed under multiplication, we have !"# ! !"# ! Canceling the following Cartesian product segment from both sides ! ! Yields the Fermat's little theorem congruency !"# (3.2) If the congruency (3.2) is satisfied, then we have an indication that the number is probably a prime. In contrast, if !"# settles the algorithms result; is assuredly a composite number and is called a Fermat's witness Every other satisfying the congruency is also given this title. Testing the same number is a Fermat's liar Primality testing of a large number follows the algorithm described in Table 3. Table 3: Pseudocode of Fermat's based primality test Input: p, k Output: true if p is prime | otherwise false 1. Repeat k times 2. pick a random a [1, p 1] 3. if !"# return false 4. return true Although the algorithm is probabilistic, it provides a convenient proof of a number's primality based on two points of view. First, we have a control on the
51 probability of correctness by manipulating the parameter to a larger value, the number of times to test the given number against different values of In terms of complexity, the algorithm running time is polynomial, !"# The downside of Fermat's primality test is that it fails in testing Carmichael numbers 6 It is believed that there exists an infinite list of composite integers such that !"# is satisfied for every a coprime of Moreover, the congruency !"# can be satisfied for many values of which gives us a high probability that is prime, bu t could also fail the test for a very large value of This first downside has been addressed in the next primality test, the Millar Rabin primality test. The second issue applies to all non deterministic algorithms and is resolved in AKS, which is cla ssified as a deterministic algorithm, but unfortunately, a disappointing experimental results on the algorithm stands behind unpopularity of the algorithm in practice (see details in section 22.214.171.124). 126.96.36.199 Miller Rabin Primality Test Due to the fact that the cardinality of Carmichael numbers is believed to be infinite [Pin08], there are sufficient concerns raised by Fermat's primality test Consequently, primality tests should detect Carmichael numbers as demonstrated in the Miller Rabin algorithm. The Miller Rabin algorithm was first proposed by G. L. Miller [Mil76] and later modified by M. O. Rabin [Rab80]. Their approach is more practical than Fermat's primality test as it considers the case of Carmichael numbers, which are undetected, and !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 6 Carmichael numbers are composite numbers satisfying the equality of Fermat's theorem They are fairly rare such that their are only seven numbers less than 10,000. These numbers are 561, 1105, 1729, 2465, 2821, 6601 and 8911 [Pin08].
52 result to a serious threat to the cryptosystem if tested with Fermat's test considering the Carmichael composite numbers. The algorithm, in addition to the Fermat's test composite numbers are found to exhibit the following congruency: !"# i.e., !"# (3.3) The congruency (3.3) implies that is a divisor of ! hence by Euclid's lemma, at least divides one of the factors or This gives an assertion that is not prime. Table 4: Pseudocode of AKS primality test Input: p, Output: true if p is prime | otherwise false 1. if p is a perfect power 2. return false 3. pick 4. if has factor less than 5. return false 6. for !"# 7. if not ! !"# 8. return false 9. return true The accuracy of Miller Rabin test is high due to the fact that for a number the number of witnesses to the compositeness of is at least ! (proof can be found in [CLRS01]). The run time is polynomial, !"# arithmatic operations and !"# bit operations [CLRS01]. Table 5: Miller Rabin vs. AKS practicality comparison Test Time Miller Rabin 0.3 seconds AKS 37 weeks (Estimate)
53 188.8.131.52 AKS Primality Test Agrawal, Kayal and Saxena first proposed the breakthrough algorithm (AKS) in primality testing in 2002 04 [AKS02]. Unlike previous primality tests, their approach was deterministic and runs in polynomial time. The basis of th is approach is based on the following lemma. Lemma 3.1: The number is prime iff ! !"# (3.6) for a positive integer such that !"# and an intermediate variable. This lemma is based on the Fermat's little theorem discussed in section 184.108.40.206 [AKS02]. Fortunately, the lemma has been proven in [AKS02], but the from the congruency (3.6) has coefficients which indeed requires a running time to evaluate. Hence, the number of coefficients should b e reduced to decrease the number of operations needed to test the congruency (3.6) against a given integer. This reduction can be done by simply evaluating a modulus polynomial of the form for both sides of (3.6) for a small integer [AKS02]. Hen ce, the congruency (3.6) can be rewritten as ! !"# (3.7) The congruency (3.7) adds an improvement to computing Hence, The degree of polynomials can be reduced to be as low as the chosen value of The complexity of the AKS algorithm is dependent on the choice of ; fortunately, it is provable that a good choice of needs to be logarithmic with respect to which makes the running time of AKS !"# In practice, AKS is slow and un suitable to practical applications. The exponent of running time of the algorithm is considered large and makes the computation slow regardless of its classification as a polynomial time
54 algorithm. In addition, it outputs either true or false based on th e primality of the algorithm and there is no better way, other than re running AKS, to re check that the algorithm did not run into an internal hardware error, resulting in misleading output. This is unlike the Rabin Miller algorithm, which outputs primal ity and composite certificates, which can be checked efficiently to ensure the correctness of the output without the need of re run the whole algorithm again with the same input. Conclusively, it is important as it proves the problem of certifying prime n umbers is in the complexity class which has remained an unsolved problem in computational number theory for decades. Table 5 illustrates an experimental comparison between Miller Rabin and AKS algorithms, testing the number !" !"" !"# running 100 tria ls on a 1 Ghz machine [Bre06]. Richard P. Brent justifies the AKS slowness and unsuitability to the high probability of hardware errors for AKS computations, however, there is no doubt as to its theoretical correctness [Bre06]. 3.1.3 IFP based Public ke y Cryptosystems The breakthrough of the first practical public key cryptosystem (RSA) by Rivest, Shamir, Adleman in 1977 is considered the foundation of an entire generation of technology security products, as well as cryptographic protocols which follow s imilar concepts discovered in the design of RSA (e.g., employing IFP as a one way trapdoor function). The proposed cryptosystem commonly known as RSA, is based on the intractability of the IFP. The previous discussion on the IFP beside the primality tes ting and the correlation in forming one way functions should be notable when discussing RSA design as well as other IFP based cryptosystems such as the Rabin cryptosystem.
55 220.127.116.11 RSA RSA [RSA78] is based on the difficulty of solving the IFP The RSA algorithm operates in three steps: key generation, encryption, and decryption. Recall the definition of public cryptography schemes (refer to section 1.2), where these schemes use a public and private key for encryption and decryption respectively The public key is made public to everyone and should conceal information on the private key in which the later must be kept secret to persons wanting to decrypt messages. RSA set up: consider a small communication channel consisting of two end parties Alice and Bob with the assumption that Eve is an active listener in the communication channel. Alice generates the key by uniformly at random choosing two large prime numbers and Alice then computes the modulus i.e., Alice then com putes the Euler's totient 7 by applying the formula ! ! Alice then chooses the public key, denoted by (i.e., is coprime of satisfying and !"# ). The reason of requiring to be a coprime to is to en sure that has a multiplicative inverse in the ring modulus The multiplicative inverse of or private key, denoted by is computed by using the following formula: !"# (3.8) This above equation implies is an inverse of Alice then publishes to the public and keeps secret. In a general case, each communicating party generates a different triple !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! 7 Euler's totient, is the number coprime intege rs to
56 RSA message exchange: assume the key pair is transmitted through a public communication c hannel to Bob. Bob may send a secured message to Alice by encrypting it in the following way: Bob computes the ciphertext, denoted by via the equation !"# The scheme uses modulus; hence, the message circulates through the domain [0 n ]. In another words, the encryption equation maps every instant of to a unique instant such that, ! For this to be one to one mapping, a property of cryptographic Hash functions, must be (i.e., is enciphered into numerical format, otherwise, the pigeonhole principle ensures us two instances of will be mapped to the same ciphertext In case the message must be partitioned into blocks of ! i.e., Once Alice receives she can decrypt the message by using the triple and compute !"# reversing the mapping Bob did in the encryption step. RSA digital signature: a digital signature on message is computed according to the equation !"# where is the signer's private key (Bob's key in our scenario). The recipient, Alice, verifies the signature by computing !"# Security remark: It is believed that the security of RSA is equivalent to factoring a large composite integer, t he public key The modulus being 2048 is trusted to resist both computing and algorithmic supremacy [RSAlabs, BR11]. However, RSA security is equivalent to hardness of factoring and has not yet been proven. For instance, strong theoretical eviden ce has demonstrated that breaking RSA is equivalent to the difficulty of solving the IFP [AM09, Bro08]. In contrast, analysis conducted by D. Boneh from Stanford University and R. Venkatesian from Microsoft research shows that RSA cannot
57 be equivalent to f actoring [BV98]. Therefore, throughout this paper, discussing the factorization algorithms is equivalent to discussing a threat to RSA algorithm and other IFP based cryptographic protocols. Breaking RSA by factoring the modulus : The private keys, as wel l as the public key, are constructed via the prime constitutes of the modulus which is made public; hence, Eve has the value of In order for Eve to recover the private key, Eve needs to factor into its prime constitutes and to compute the Euler's totient The Euler's totient formula as defined in the set up is ! and since the public key is known to everyone, Eve can recover the private key from !"# ! !" for some (3.9) 18.104.22.168 Rabin Michael C. Rabin proposed another cryptographic protocol based on the intractability of the IFP in 1979 [Rab79]. Unlike RSA, in which it is conceivable a decryption algorithm might exist without involving the factorization of the modulus breaking the Robin cryptosystem has been proven to be as difficult as solving the IFP [Rab79]. The Robin method also differs in the exponent of encryption, in that it uses instead of the RSA exponent as demonstrated in the following descriptio n. Rabin set up: consider a small communication channel consisting of two end parties, Alice and Bob with the presence of Eve. Alice generates the keys by where are uniformly distributed, large random prime numbers. The random primes a nd must also satisfy the equality ! !"# Rabin believes the later
58 condition easies the computation of the square root congruency (3.10). Alice publishes to the public. In a general case, each communicating party generates a different ke y Rabin message exchange: Every other communication party, for instance, can encrypt the message into ciphertext by computing !"# Alice can decrypt the ciphertext by solving the following system of congruencies: ! ! ! ! ! ! ! !"# ! !"# (3.10) Solving the system of congruencies results in four solutions ! with one of them being the original message [Yan13]. Security remark: Michael C. Rabin states a theorem regarding this scheme stating that if there exists an algorithm to solve (3.10) to find one of the solutions in steps, then there is a factorization algorithm, which requires !"# Rabin stated a proof in [Rab79], which indicates that inversion of the function (3.10) is at least as hard as factoring the modulus 3.1.4 IFP based Cryptosystems Attacks (Weak Primes) In view of the assumption, breaking IFP based cryptosystems is equivalent to the difficulty of finding an efficient algorithm to factor a large composite number to its prime constitutes, research community turned into finding special cases in which the IFP is easy to solve. There exist certain properties on prime numbers, if satisfied, then their product can be efficiently factored. By way of illustration, consider and chosen in such away that Then, factoring is a matter of finding its squ are root In addition, certain factorization algorithms are effecient if and have some specific properties.
59 22.214.171.124 Twin primes Forming the Modulus Twin primes are two prime numbers and such that The twin prime conjecture states that there are infinite twin primes [Guy94]. This conjecture makes it possible to choose two primes and to form the modulus as positive twin primes exist in the range [0, ]. This can form weakness to the IFP based cryptosystems because the modulus can be easily factored as follows: consider the twin primes and are chosen to compute the modulus By the definition of twin primes, and thus they can be rewritten as and which yields ! = The factors of can be found by computing 126.96.36.199 Fermat's Factoring of Factors with Small Difference Cryptonalysis of RSA conducted by de Weger [Weg02] proves that the Fermat factoring algorithm can insta ntaneously and efficiently factor large composite numbers such as RSA modulus n if the difference is small, approximately [Weg02]. Assume and have the same bitsize, hence, bitsize is twice bitsize [SZ01]. This assumption yields that is at most Findings of de Weger were based on the property ! (see proof [Weg02]). In Fermat's factoring algorithm, we try to find two integers and such that ! If we succeed, then we set ! and ! The values of and can be found by simply starting the search from in increments by 1 [ ! until ! The complexity is dependent on the value of and de Weger finds
60 that for some constant That is, the number of tries is bounded by at most tries. The American National Standard Institute ( ANSI ) suggest s that and should differ from the first 100 significant bits [ANSI] and !" !" [Wag02]. 188.8.131.52 Pollard's Factoring if ! Greates t Factor is Small Let the prime numbers p and q constitute the RSA public key. Then, and should be checked so that there greatest prime factor is greater than !" !" Otherwise, Pollard's algorithm [Pol74] can efficiently factor by applying the following procedure: choose at random and a positive integer !"# ! The integer k is defined to be a number divisible by many prime powers; hence, the larger is the more likely the attack will succeed. I f we compute !"# and !"# for ! until we find then is a factor of otherwise we repeat the process with a different choice of The complexity of this attack is !"# !"# and there is tradeoff bet ween attack success and complexity of this algorithm (parallel to the size of ). 3.1.5 IFP based Cryptosystems Attacks (NFS Factorization) 184.108.40.206 The Number Field Sieve The Number Field S ieve factorization (NFS) algorithm is considered the most pol ished factorization algorithm, both theoretically and practically. It has been used to factor the majority of RSA keys published as part of the RSA factoring challenge [RSAlabs]. The NFS has pushed out the capability of factoring to integers having more than one hundred digits e.g., RSA 768, 23 2 digits is factored using the General N umber
61 Field S ieve GNFS, in which no other algorithm has this capabil ity up to date. Moreover, the Special Number Field S ieve SNFS has factored larger numbers 1061 bit [Gre12] which carry out a special form, and thus, its special form made it an order of magnitude easier to factor. John Pollard invented the NFS algorithm in 1988. The first recognized achievement of the algorithm was its capability to factor numbers of the for m ! (3.11) where equation (3.11) is the form of Fermat number, e.g., the ninth Fermat number ! which has 155 decimal digits. In this sect ion, we are more interested in the counterpart of the algorithm that works on the general f orms of composite numbers, GNFS The efficiency of the special forms factorization algorithms such as SNFS may not pose risks, as the forms of these numbers they are capable to factors can be eliminated in the stage of generating the random numbers. Yet, t he SNFS runs slightly faster than GNFS and experimentally was capable to factor larger numbers as shown in Table 4. Table 6: NFS algorithms latest records NFS class Latest record Run time SNFS 1061 bits !"#$ !"#!"#$ GNFS 768 bits !"#$ !"#!"#$ The main idea of the GNFS goes back to the Fermat and Legendre finding that !"#$ This is equivalent to ! and thus, gcd ( ) gives a non trivial factor of In short, the aim of the algorithm is to reduce a given composite
62 integer to the form ! ! which is sufficient to find a solution for !"#$ The algorithm is based on the following four steps, polynomial selection, s ieving, matrix and square roots step. The reminder of this section gives a brief description of these steps as applied to breaking RSA 768 [EPFL10], with the aim to show the effort and resources needed to factor the latest record, RSA 768 of factorizatio n. 220.127.116.11 Step 1: Polynomial Selection The number field sieve starts with choosing two irreducible and with a common root satisfying the congruency ! !"# where is the number to be factored. Let degree ( and degree ( then, the two functions and are used to define the set of relations S given by coprime pairs of integers such that ! for i.e., are smooth with respect to some bound ; if none of are larger than Experimental analysis of further suggests a slight change on the smoothness definition to be more flexible. Let !"# for then should be smooth with respect to and such that, four prime factors of are between and The remaining factors are at most 18.104.22.168 Step 2: Sieving Sieving process is an extension of the previous step, the polynomial selection. In this step, we need to find relations of coprime pairs of integers ! and ! for i.e., is smooth. The ideal way of sieving, as appears in [EPFL10], follows the linear sieving approach. Let the pair be
63 !"#$% !""# ! for The prime ! if !"# (i.e., for a fixed value their exists !" !"# ). That is, we have per value for each pair of of all relevant values given by the r elation !" !" for some integer In short, sieving step returns a large number of pairs in the form A total of 27,797,115,920 pairs were needed in the factorization of RSA 768. 22.214.171.124 Step 3: The Matrix The matrix step takes the relation s, or pairs of the form and constructs a very large, sparse matrix. The role of this step is to find the linear dependencies of these pairs, which can be found using the Wiedemann algorithm. Unlike traditional algorithms, Wiedemann is favorable and feasible to solve sparse matrices as opposed to the counterpart algorithm (e.g., Gaussian). In practice, the Wiedemann algorithm takes advantage of the fact that 64 bit machines can process blocking multiplication of 64 different vectors, and the abi lity to run several of the algorithm's steps in a distributed manner. 126.96.36.199 Step 4: The Square Roots Subsequent to the matrix step, and after successfully finding the linear dependencies, we can construct ! ! ! ! (3.12) where and in (3.12) are the primitive roots of and respectively, such that and and ! and ! Since there exists maps such
64 that !"# and !"# we have ! The square roots and of and respectively can be obtained using the method described in [Ngu98], and if we let and then !"# The non trivial factor of can be found by !"# 3.1.6 Statistics of Useless Public Keys Mathematician team from EPFL, a Switzerland Federal Institute of Technology, published a disconcerting observation on public key infrastructure e.g., X.509, which uses a public key cryptosystem (e.g., RSA) for its d igital signatures [LHABKW12]. The study focused on investigating from randomness properties of 11.4M actual public keys used to secure web mails, online banking and other cryptographic applications. They found that 26,965 keys provided no security, that is, 2 out of 1000 (0.2%) of these keys were useless. In addition, 6.4M of these keys were RSA modulus; with the rest distributed into ElGamal and DSA keys. Of 6.4M RSA keys, 0.27M or 4% of these keys were shared to unrelated parties, 1.1% of them were dup licated thousands of times. On the other hand, tests against compositeness or primality of these modulus shows rare faults on the construction of these keys, thus, no concerns in regard to computational properties and most of keys have met the standard cho ice of prime constitutes of the RSA modulus. The bottom line of the concern, is that duplication may allow each user to attempt decrypting messages of another user using the same modulus, or factor the victims public key using one of his prime factors sin ce the possibility of sharing the same modulus, or prime constitute, may be around 4% of the time. In our point of view, due to the fact the RSA modulus size follows the standards of either 1024 or 2048 bits, then there are !"#$ or !"#$ possible modulus respectively. Let
65 us consider the case when the modulus is 1024, then the size of its prime factors is roughly 512 bits, that is, !"# possible number must be chosen from to build a 1024 modulus. Given a 512 bit space, then the largest number can be repre sented is !"# By the asymptotic law of distribution of prime numbers the number of prime numbers less than or equal a number is given by !" That is, for a 1024 bit modulus, there are approximately !"# !" !"# prime numbers t hat can constitute the RSA modulus of 1024 bits, where represent the prime numbers eliminated from special case properties that make factorization easy T he number of users is billion based on he latest statistical release statistical data by the International Telecommunication Union ITU [ITU13 ] Regardless of how many keys generated per user, the probability of duplicates is still converges to 0 since is extremely large. This indicates that there may still exist an issue with rand om number generators [ LHABKW12]. 3.1.7 Threats Resistance by Incrementing The Key Size Moor's law states that the computation power doubles every 18 months. The law is based on a historical observation on the number of transistors on integrated circuits since 1971. The increase of computation power raises considerations about the size of keys used in such cryptosystems, i.e., we may search for the RSA modulus factor faster than before. Most cryptosystems, e.g., RSA, are upgraded by increasing the key s ize in parallel to the increase of computational power The drawback of such security upgrades has an impact on the complexity of these cryptosystems as the increase on the key size
66 influences the primality test on its factors and the algorithm's procedur es itself. However, the later might not be of a great concern until an algorithmic breakthrough on integer factorization emerges, since computational power increases in parallel with the upgraded key sizes. 3.1.8 Current Milestone of the IFP Despite the f act it has not yet proven if the RSA like cryptosystems are only threatened by the development of an efficient factorization algorithm factoring the modulus seems to be the easiest way to break such cryptosystems. In the past years, progress in factoriza tion algorithms has been remarkable. For example, RSA 120 is factored using a multiple polynomial quadratic sieve (QS) [DDL94] and RSA 140 by the number sieve algorithm [SBA99]. Subsequently, RSA 155, RSA 160, RSA 576, RSA 640 and RSA 768 were factored [ RSAlabs]. The period of time between the factorization of RSA 576 and RSA 768 is a decade, and the factorization of RSA 768 was estimated to be several thousands times harder than RSA 576. Thus, it is not infeasible to see the factorization of RSA 1024, which is estimated to be one thousand times harder than RSA 768 [EPFL10]. In addition, larger numbers that carry out a special form were factored; a complete illustration of the factorization progress is illustrated in Figure 2. Presently, a 2048 bit mod ulus is recommended by both NIST and RSA laboratories and is believed to provide a protection lifetime extended to 2030 [RSAlabs, BR11]. This recommendation reflects the state of the IFP and also imitates the ideal choice of composite integer sizes used i n other cryptosystems similar to the IFP (e.g., Rabin's cryptosystem). The history of these types of problems over the last decades emphasizes how hard it is to break these types of cryptosystems. For instance, a 125 digit number
67 was estimated to require 40 quadrillion years to be factored if a modular multiplication could be achieved in nanoseconds [YJ06]. Unfortunately, this assumption failed to consider the possibility of progress in computing power and algorithmic development. The IFP thus is not pred ictable and assumptions have conflicted with reality in the past For example, an estimate by cryptographers on factoring the 125 digit number was 40 quadrillion. Figure 2: Integer factorization progress where current target demonstrates the current recommended key size 2048 bit. 3.2 The Quadratic Residue Problem (QRP) The quadratic residue problem (QRP) has been considered in modern cryptosystems design due to it is hardness over a multiplicative group It is considered the basis of the Blum Blum S hub cryptographic pseudo random number generator [BBS86] and Goldwasser Micali cryptosystem [GM84]. The security of these cryptosystems is equivalent to solving the QRP and it turns out to be equivalent to solving the IFP as will be shown later in secti on 3.2.2. 0 500 1000 1500 2000 2500 1990 2000 2010 2020 Key size Year NFS Progress General Form Special Forms Current Target
68 Table 7: The congruency !"# has a solution 1, 2, 4 and thus 4 is a quadratic residue mod 7. Note that 0 is not considered a quadratic residue since !"# ! To confirm this using the Legendre symbol, let and By Definition 3.2, we have !" !"# = 1. 0 1 2 3 4 5 6 !"# 0 1 4 2 2 4 1 Definition 3.3 ( Quadratic residue problem QRP ): given two integers and i.e., !"# ! decide if there exist an integer such that !"# ! (3.13) If the congruency (3.13) has a solution, then is called a quadratic residue !"# and the set of all solutions are denoted by In contrast, the set of integers that doesn't satisfy (3.13) are called non quadratic residue and denoted by [Yan13]. Symbolically, ! i.e., ! !" : !"# ! The traditional solution, or the brute force method, searches all possible values of i.e., Security remark: T he QRP, mainly the congruency (3.13), becomes intractable when is a large composite number (e.g., an RSA number) [GM84]. The problem is believed to be equivalent to the IFP in a sense that factoring yields a solution to the congruency (3.13) [GM84] This implies any threat on RSA yields a threat on QRP based cryptosystems (e.g., Goldwasser Micali cryptosystem [GM84]).
69 3.2. 1 Solving The QRP for a Prime Modulus The QRP can be efficiently solved once the modulus in the given congruency (3.10), is an odd prime number. The solution relies on the Legendre symbol, which is based on the following theorem [Lev96]: Theorem 3.4: If is a prime number, then for an arbitrary integer ! !"# such that according to being a quadratic or non quadratic residue of The Legendre symbol, is a simple phrase of the above theorem and is defined as follows: Definition 3.4 ( Legendre symbol ): is a number theoretic function i.e., !"# which is defined to ta ke either values of ! depending on whether is a quadratic residue modulus i.e. a prime number Hence, ! !"# ! ! !" ! ! ! !" ! ! ! !" Proof: foll owing the convention of Fermat's little theorem and resulted congruency (3.1), Legendre symbol equals 1 only if is a quadratic residue modulus a prime integer That is, ! ! ! !"# In case it outputs zero, then and since for any even integer and thus !"# Otherwise, output 1 is an indication that is non residue modulus i.e., !
70 The Legendre symbol provides an evidence to whether a quadratic residue, e.g., co ngruency (3.10) has a solution or not (see illustration in Table 7). Theorem 3.5: The Legendre symbol has the following properties [Lev96]: 1. !" ! which implies if either both or 2. if !" then ! ; 3. ( if 4. ! ! Thus, 1 is a residue if !"# but not if !"# Proof: Property (1) can be obtained by substituting with !" in the Legendre symbol given in definition 3.2 such that !" !" !"# Algebraically, this implies !" !"# ! !"# since both and can take either value +1 or 1, then their product must be 1 for !" to be a quadratic residue, then both and must be either 1 or 1 which is equivalent to saying both are quadratic or non quadratic residue. The other properties (2 4) can be trivially proven following the same convention of the proof of property (1).
71 Theorem 3.6: Let the modulus of the congruency (3.10) be a prime integer, then there are exactly quadratic residues of Proof: The problem consists of a set of numbers ! a circular group !" That i s, for any given integer ! there exists an inverse which has the same square of For instance, the squares of ! ! ! are quadratic residue of modulus and the other are quadratic non r esidue. Following the convention of theorem 3.1, we can establish the following characteristics: 1. If is residue a prime modulus then the inverse is also a residue; 2. If is a non residue a prime modulus then the inverse is also a non residue; 3. ! Remark: these luxury findings are only applicable for the case the modulus of QRP is a prime integer. 3.2. 2 QRP Reduction to the IFP The reduction is based on the fact that solving the QRP for a prime modulus is easy and can be computed efficiently with run time bounded to !"# using the algorithm given in [BS96]. Its intractable when the modulus is a large composite number (e.g., an RSA key), thus, factoring divides the problem into simultaneous congruencies, one for each prime factor of n More precisely, if the congruency (3.10) is
72 modulus a composite integer n with prime factors then by the Chinese R emainder theorem, we can solve the following congruencies simultaneously !"# !"# i.e., is called a quadratic residue of only if it is a quadratic residue of modulus every prime factor of i.e., and Theorem 3.6: and then where for prime integers and Proof: Let us say, the prime factors of and are known. The QRP over given in definition 3.3 can be re written as !"# ! !"# ! !" !"# ! !" (3.11) (3.12) Since both and are primes, then !"# ! By the Bezout's lemma, there exist two integers and where such that !" !" !"# Likewise, !" !" (3.13) Substituting (3.13) for by (3.11) and (3.12) yie lds !" !" !" !" (3.14) By taking the common factors of the equation (3.14), we get !" !" !" which implies !" !" !" Since both and are integers, the later can be reduced to !" and this implies !"# !" That is, if is a quadratic residue then !"# !" !"# and !"# which is equivalent to our first reduction in (3.11) and (3.12). In short, if is a quadratic residue then it is also a quadratic residue its prime factors
73 and in which the ease of solving the case of prime numbers, then factoring yields an efficient solution to the QRP. 3.2. 3 Complexity Class of the QRP We have shown that quadratic residue problem can be solved efficiently if the modulus is a prime number, that is, QRP when the modulus is prime. However, the problem becomes equivalent to the IFP when the modulus is a composite integer, thus, QRP !" but not known if it lies in !" hard. 3.3 The Discrete Logarithm Problem (DLP) The Discrete Logarithm problem (DLP) is derived from the ordinary logarithm !"# where solving the logarithm with respect to over a cyclic group is believed to be hard. Resembling the IFP, its difficulty has been exploited in the design of several cryptosystems such as ElGamal Cryptosystem and Diffie Hellman key exchange The precise definition of the problem is as follows. Definition 3.5 ( Discrete Logarithm Problem DLP ): Let be a multiplicative group, a cyclic subgroup of generated by find an integer such that !"# The nature of DLP, b eing easy to solve in one direction, and infeasible in the other direction, makes it analogous to the IFP, where multiplication is easy but not factorization. The first consideration of DLP as a candidate cryptographic one way function goes back to the pa per of Diffe and Hellman [DH76].
74 Remark: It is important to emphasize that unlike QRP, the DLP on a group where is a large prime integer, is considered hard and no efficient algorithm has been found up to date. 3.3.1 DLP based Cryptosystems Att acks (Algorithmically) The difficulty of DLP is dependent on the group involved in constructing an instance of the problem. Precisely, it is very easy (polynomial time) in hard (sub exponential) in and even harder (exponential) in Elliptic curv e groups. The best known algorithm to solve the DLP is the NFS which subexponentially runs in !"# !"# !"# !"# on group [Gor93]. It is similar to the NFS version for IFP discussed in section 188.8.131.52, thus, we will consider other sophisticat ed algorithm, the index calculus. In fact, DLP is reducible to the IFP and thus there is an algorithmic association between the two problems, i.e., algorithms for solving the IFP can be applied toward a solution for DLP (details in section 3.3.3). 184.108.40.206 The Index Calculus Index calculus was first attributed to Western and Miller [WM68], and the general purpose subexponential algorithm is due to Adleman [Adl79]. The algorithm is considered the most powerful technique to solve the DLP. The algor ithm is based on the idea that if we are able to find the logarithm of small and independent elements of a group then we should be able to find the logarithm of any element This idea is based on the assumption that most elements can be expresse d in terms of small, independent elements of whose !"#$ are easily computable.
75 The description of the algorithm following the conventions [MOV10] is as follows: first, generate a factor base a subset of such that ! where for are prime numbers. The ideal choice of the elements of is that many elements of can be expressed as a product of elements from the set Next, we collect linear relations for a small positive integer by repeatedly selecting a random integer i.e., and compute as a product of elements from That is, is represented as follows: ! !"# ! ! After collecting relations of the above form, take the logarithm of both sides to get !"# !"# which yields the linear system of equations of the form !"# for The last stage of the algorithm involves selecting another random integer ! and computing as a product of elements from Symbolically, ! !"# ! If above equation is found, then we take the logarithmic of both sides, which yields !"# !"# ! !"# and is the solution. The expected run time of this variant of index calculus, specifically the Adleman algorithm, is !"# !"# !"# !"#
76 3.3.2 DLP based Public key Cryptosystems DLP has been adopted extensively and applied in cryptographic systems, as it p lays the role of one way Hash functions. Several cryptosystems were built based on the DLP by embedding a trapdoor in the function. Let us recall the specification of the problem in terms of group theory. Consider the group where is a prime specifying the group. From a cryptographic prospective, groups being used are cyclic, such that there is a primitive root, or generator such that !"# Finding the generator x of a group for some prime allows full control over the group (i.e. computing its operation through element inverses). Based on this idea, several cryptosystems were introduced including ElGamal, and Diffe Hellman key exchange. 220.127.116.11 ElGamal ElGamal cryptosystem is attributed to Taher ElGa mal and was first introduced in 1984 [Elg84]. It is currently integrated into GNUPG software packages to secure emails, file directories and disk partitions. The main advantage of ElGamal encryption is that it is based on randomized encryption; it encryp ts a plaintext by randomly choosing a ciphertext from a set of ciphertexts corresponding to the message under the same key (i.e. one to many correspondence at random). Randomized encryptions achieve better cryptographic security than their determini stic counterparts by increasing the size of message space. ElGamal set up. The description of the algorithm starts with the scenario where Alice communicates with her peer Bob. Alice first generates a multiplicative cyclic group from randomly chose n large prime with a generator Alice then selects an
77 integer and computes !"# Alice publishes ) to the public and keeps secret (decryption key). ElGamal message exchange. Once Bob obtains Alice's public key, he can encrypt any plaintext to Alice. More simply, Bob represents the plaintext M as an integer of the form ! such that for Computing the ciphertext is relatively simple. One chooses a random integer ! and compute !"# and and sends the ciphertext as the tuple Alice can recover the plaintext from by computing the inverse of the ciphertext thru the equation ! !"# Security remark. The security of t he ElGamal cryptosystem relies on the difficulty of finding the decryption key from the published key In the current state of the theory, there is no known way to break the ElGamal cryptosystem without solving the discrete logarithm problem. That is, Eve can obtain ! and and must recover from over the multiplicative group 18.104.22.168 Diffe Hellman Merkle Key Exchange (DHM) DHM is attributed to Whitefield Diffie and Martin Hellman and was first introduced in 1976 [DH76]. DHM establishes a key agreement over an unsecured communication channel without prior familiarity between communicators. It is based on the infeasibility of solving a discrete logarithmic probl em and it was the first step toward the revolutionary invention of RSA. Key agreement set up. First, all communication parties, Alice and Bob, agree on a multiplicative group modulo and a generator The agreement on these values is based on the fact that is made public; hence, Alice or Bob can start generating
78 them and send them to each other. Alice and Bob choose and respectively, such that ! and each compute their version of the discrete logarithm !"# and !"# Both Alice and Bob exchange their computed value of and and compute the shared key !" !"# From a cryptographic prospective, Eve is assumed to have obtained every data exchanged through the procedure of DHM. Thus, Eve has only and The challenge placed on Eve is to find !" given only and The only known way to solve such a problem is by finding and separately from !"# and !"# The botto m line is that one must solve exactly, the intractability of DLP. 3.3.3 DLP Reduction to IFP and Vice Versa In a similar fashion of the reduction of QRP to IFP described in the previous section, one needs to break down the DLP of composite modulus into s everal instances of DLP, each for a prime factor. Then, using a Chinese reminder theorem, to extract a solution to the simultaneous congruencies. In addition, solving DLP efficiently yields a method to factor large composite integers, as described in [Ba c84]. Similarly, an efficient factorization algorithm, in addition to efficient DLP over prime modulus, yields an efficient solution the DLP over composite modulus [Bac84]. That is, unlike QRP, DLP is still hard even in the case the problem is constructed over a group with order of a prime number (e.g., ). 3.4 Elliptic Curve Discrete Logarithm Problem (ECDL) Elliptic curves are an algebraic structure of an abelian variety with commutative property over its points, which are constructed geometrically Most of the cryptosystems discussed in the previous sections can be reformulated into the ECDL set up and it was
79 first by Neal Koblitz [Kob87] and Victor S. Miller [Mil85]. The underlying intuition of the ECDLP is to find the logarithm of a given point with respect to a base and publically known point on the same Elliptic curve This principal reason, in addition to the fact that no algorithm exists to solve ECDLP in less than made it a suitable to the basis of several cryptosystems. 3.4.1 An Overview of Elliptic Curves Elliptic curve is a set of pairs to a non singular curve over a given finite field, commonly and is defined by the equation !"# where is a square free cubic or quartic polynomial over the fi eld with a discriminant and a prime The curve also contains an identity point a point at infinity, and thus, the curve is defined as ! Example: The Weierstrass equation form !" and the discriminant can be computed such !" The main reason the discriminant condition, of is that the Elliptic curve is non singular, and has no self intersections, or repeated roots. The structure of Elliptic curve can be con verted into groups, which can serve as a cryptographic tool. Furthermore, addition laws exhibit invalidation if repeated roots exist and addition law makes the set of points of an abelian group that becomes practical for cryptographic purposes.
80 22.214.171.124 Points Addition Operation The additive laws on Elliptic curves points exhibit an abelian verity group's properties. Let us start with the fundamental Elliptic curve properties, which will be the basis in defining the addition operation of Elliptic curve s and the group over the addition operation. Given a point then there is a tangent line on which intersect with on a second point Given two points then there is a line connecting and which intersect with on a thir d point For every point i.e., there is a reflection point i.e., ! Point at infinite, is defined when is intersected with two points and not intersected with a third point Hence, is vertical and In other words, every EC point has an associated point that is defined by the intersection of the tangent line on that point. If no addition point at intersection with E resulted from one or two points, then it is assumed to be at an infinite point denoted by an its the identity point. The addition operation is done through two cases: Case 1: Distinct points ( ): Let us say we have two points The res ult of adding two points and is a reflection of a third point generated by drawing a line connecting both points and That is, a line connecting and intersects with E on a third point The reflection is given by ! ), such that
81 Case 2: Points doubling (2 ): Adding a point to itself ( ) is accomplished by drawing a tangent line on such that intersects with on a second point The reflection of The aim is to draw a line L and find the points in which E intersects with L Let us say the line L is represented by the slope intercept form !" such that is the slop and is defined as !"#$ ! ! !" ! ! !"# !" ! ! Remark: note that is the slop of the tangent in the case and the slop of the line when For example, an elliptic curve given by the Weierstrass equation !" The points in whic h intercept with can be obtained by substituting of L into of such that !" ! !" In fact, we already know two solutions of the above equation, and from the points to be summed ! and ! Computing the above quadratic equation yields a third solution [Sil06], ! and is computed from the line formula ! That is, the line intersects with on a third point ! and by the definition of addition, ! !
82 Remark: The multiplication is done by repeatedly adding points along an elliptic curve to itself. 126.96.36.199 Defining the Elliptic Curve Group Theorem 3.7: The addition law on an Elliptic curve E has the following properties for all and 1. ! 2. 3. ! 4. Figure 4 : (a) where (b) Points doubling. (c) where on the curve defined by !" In other words, addition operation of an Elliptic curve group exhibits a commutative property The identity had been identified in the previous section, and the inverse is similar to the reflection point, where adding them will always yield a point at infinity. Associative and commutative property can be recognized from the fact that a line
83 can intersect with at most three points on E In addition, the Elliptic curve is a finite cyclic group based on the Hasse's theorem Theorem 3.8: For an elliptic curve over the number of points on is bounded by !" ! Where we define the addition of elliptic curve points to derive the way manipulation of these points will be used in constructing a trap door function based on ECDLP problem. 3.4.2 Elliptic Curve Discrete Logarithm Problem Definition 3.6 ( ECDLP ): Given an elliptic curve over a finite field and two points the ECDLP is the problem of finding an integer such that !" The integer is denoted by !"# such that is the elliptic disc rete logarithm of with respect to That is, a point P is equivalent to the sum of ! ! !"#$% It appears that ECDLP based cryptographic protocols provide more security than the classical discrete logarithm problem. That is, w ith fewer bits, let us say 160 bits, we can achieve an equivalent security provided by RSA 1024 [NSA_Elliptics]. In addition, algorithms based on the classical DLP, such as the index calculus, are not applicable to the ECDLP since the points of an Ellipti c curve cannot be factored. 188.8.131.52 Converting Characters into an Elliptic Curve Points The general setting of ECC is to embed plaintexts M as points into an elliptic curve over a finite field In a similar way with any cryptographic algorithm, cha racters are
84 first enciphered into numerical format to allow mathematical computation of the cryptographic algorithm. Thus, this influences the lower bound choice of of the finite field Let us say the English letters are enciphered into their ASC II format, which ranges from 0 to 127, and then should be chosen so that the Elliptic curve has at least 127 points (refer to the Hasse's theorem). For instance, let be a character in numerical format such that is larger than the number of poi nts on then to map into we represent the point where is a root on If is larger than the largest possible point value, then is not a root on over The bottom line, the choice of should offer a range of possible values of every character to represent a point. Let us say we have a plaintext to be represented into an EC point or set of points. The message is first divided into smaller blocks in numerical format. The idea is to have each block of mapped on the line using the line formula until it intersects with the Elliptic curve. Symbolically, it is represented as ! I E ., ! One should try different possible values o f in the range until a point hence, until !" is a square modulo Up to this point, we only mapped the plaintext into and no security is yet provided since each data block can be recovered from the coordinate of a point by computing The second step of objectives is to hide the point on that is, building an instance of ECDLP described in definition 3.6. Let the message be mapped to a point and there exist an integer n such that is a secret key agreed upon all legitimate parties (Alice and Bob). Then, multiplying the point by produces a different point
85 such that !" If is transmitted over an unsecured public channel, Eve obtains a copy of in which E ve needs to know the secret to recover the message. This is believed to be one of the hardest problems in mathematical and cryptographic literature. Many of the cryptosystems described previously such as RSA, DHM key exchange, and ElGamal can be repro duced in this settings. The advantage of such convention is the security equivalence achieved with much smaller key size (i.e., 256 ECC equivalent to 3072 bit RSA). In the next section, we consider an illustration of DHM key exchange over Elliptic curves 3.4.3 Example: Elliptic Curve on DHM Key Exchange Alice and Bob first and publically agree upon a prime integer to define the finite field and an Elliptic curve Subsequently and analogue to the traditional way go through the following steps Alice Eve Bob Choose private key ! Compute public key !" ! Choose private key ! Compute public key !" Compute shared key !" !" Compute shared key !" !" The security of this construction is based on the hardness of solving the ECDLP. Eve obtains and Both and are computed as a product of some secret integer with a publically known base point With the knowledge of both and such that !" Eve needs to find the integer which is an instance of the ECDLP. On the
86 other hand, Alice computes !" the shared key, using his secret key with the point sent from Bob. Thus, Alice computes !" !" !"# which is equal to the version Bob computes, namely, !" !" !"# It should be stressed that the choice of is substantial; as there exists several curve families that ECDLP problem can be solved efficiently (e.g., Supersingular curves). 3.4.4 ECDLP Attacks The discrete logarithm problem is the security basis of many cryptosystems, and yet, the best algorithm that threats these cryptosystems is the Index calculus that runs in subexponential time. In contrast, the discrete logarithm over an Elli ptic curve group (ECDLP) has been proven initially [Mil85] and farther in [SS98] to resist the attacks of Index calculus, both from theoretical and practical prospective. That is, the generalization of the Index Calculus to the ECDLP yields a complexity no t more efficient that the classical brute force attacks. However, there are special cases listed in [SS98], in which the index calculus is directly applicable, in which we plan not to discuses in details, but worth mentioning these cases in which ECDLP is easy to solve: 1. Supersingular curve, where the number of points in ! Then, the ECDLP is polynomially reducible to DLP over the multiplicative group over a finite field [MOV93]. 2. If has a small prime factors, in whi ch Pholing and Hellman algorithm [PH78] can solve it with time bounded by i.e., ! 3. If ! then ECDLP can be reduced to a simple addition in by lifting the curve !"# [Sma97].
87 The absence of at least a subexponentia l algorithm to solve ECDLP, similar to the Index calculus to the classical DLP, raised the interest of using the Elliptic curve construction to build an analogue to the current cryptosystems such as Diffie Hellman key exchange and RSA. That is, it is consi dered much harder problem, and thus, more secured if compared to the classical DLP, IFP and QRP. The most noticeable result in regard to the security of cryptosystems based on the intractability of the ECDLP, is that, if we can solve the Elliptic Curve Lif ting Problem (ECLP), then we can efficiently solve ECDLP [Sil99, KCH99]. Not only that, but it can implicate the IFP and classical DLP based cryptosystems. 184.108.40.206 The Elliptic Curve Lifting Problem (ECLP) Let us say we have an Elliptic curve o ver a finite field where is a prime number, and let ! The ECDLP relies on the difficulty of finding an integer such that !" In [Sil99], Joseph H. Silveman suggest that in order to use the traditional Index calculus al gorithm, one needs to lift the Elliptic curve to the Elliptic curve and then lift distinct set of points ! to in which the relations of these lifts are used to solve the ECDLP. The ECLP can be defined more precisely as follows: Definition 3.8 (Elliptic Curve Lifting): Given an Elliptic curve a lifting curve over is another Elliptic Curve satisf ying the following congruency !"# Similarly, if contains the set of points ! and contains ! ! then for ever ! ! is congruent modulo for
88 Definition 3.9 (Elliptic Curve Lifting Problem ECLP): Given an Elliptic curve with a set of points ! i.e., ! for The problem is to find ( where ! ! i.e., for This problem is believed to be hard, and probably harder than the original ECDLP, and all attempts have failed as illustrated in a survey given in [Sil07]. However, its worthies are based on remarkable results found in [Sil99, KCH99], who brought a red uction to the ECLP against the classical DLP and IFP. That is, if an efficient solution to the ECLP is found, this will yield an efficient solution to both IFP and DLP as self contained description given in [Sil99, KCH99]. 3.5 Other Problems and Triv ial Reductions 3.5.2 Diffie Hellman Problem (DHP) Diffie Hellman problem is an extension of the discrete logarithm problem DLP. Let be a generator of a multiplicative group and two randomly chosen integers and The DHP then has to forms, a com putation Diffie Hellman problem CDHP and a decisional Diffie Hellman problem DDHP, and are defined as follows: Definition 3.11 (CDHP): given the group generator and elements and The task is to find the value of Definition 3.12 (DDHP): given the group generator ! and The task is to decide if It should be noted that, the product of the given values ) Tj ET Q Q q 0 0.00001770258 612 792 re W n /Cs1 cs0 0 0 sc q 0.24 0 0 0.24 463.4709 140.4 cm BT 35 0 0 35 0 0 Tm /TT11590 1Tf (! It suffices to find either values or to solve CDHP, which is bel ieved to be hard. In the next section, we show how CDHP is reduced to DLP, and CDHP to DDHP.
89 220.127.116.11 Computational DHP Reduction to the DLP Given the group generator with two elements of such that ! One can construct an instance of D LP using either or Let us use !"#$ !"#$ The problem is to find that satisfies the above congruency. If such is found, then CDHP is solvable by raising to the power of to obtain through the following sequence ! ! 18.104.22.168 Computation DHP Reduction to the Decisional DHP In a similar manner of the above procedure, the only known method is to first find either or to compute before the decision on either is true, or false otherwise. Yet, there is no known reduction on the other side, from DDHP to CDHP. 3.5.2 Quadratic Residue Certificate Problem and the IFP Let us recall the Quadratic residue problem discussed in section 3.2. Given and decide if there exists an integer such that !"# We have seen the easiest solution to this problem is through factoring the modulus and solving the sub problems for each of its prime factor using the Legendre symbol. This solution is equivalent in a sense to the primality testing, where the verification, or decision, is made based upon certain congruency; if it is satisfied, then the answer indicates there exists such integer x, and no otherwise. The Quadratic certificate problem is more of complic ating the solution to this problem via finding the exact value of If it is found, we can factor n based on the fact that !
90 3.6 The Implication of Quantum Computing Shor 's algorithm [Sho97], a quantum based polynomial time factorization, di screte logarithm, and a subsequent Elliptic discrete logarithm algorithms [PZ03], implicated the current cryptographic protocols based on IFP, DLP and ECDLP. In the remainder of this chapter, we briefly overview these algorithms and describe the essential s necessary to understand where the power of quantum computing comes from in addition to the quantum strategies behind these algorithms. In addition, we delve into the limitations of quantum computing, which are to be used as the basis of the next chapter Interested readers of an in depth discussion on quantum computing are referred to [YM08, EP00, EHH01]. Furthermore, a detailed explanation and a functional point of view of these algorithms can be found in [Sho97, PZ03], which are the references of the following discussion. 3.6.1 The Basics of Quantum Computing Quantum computing is based on the theory of quantum mechanics which deals with the physical phenomena at microscopic scales [ EP00]. The idea of quantum computing is attributed to [Man80], and [Fey82], who differentiated quantum computing from classical computing by proving the inability of simulating Quantum computing by a classical computer. Although the theory of quantum com puting cannot be simulated via classical computing techniques, it shows similarities to non deterministic and probabilistic Turing machines, in which one can prescribe more than one action to be performed at one state of computation. This parallel nature of quantum, or quantum parallelism, is based on the quantum mechanical phenomena, superposition and entanglement [EP00].
91 Definition 3.13 (superposition): It is the basis of the massive power of quantum computing, which enables a system to exist in all it s possible states simultaneously. But, once a system is observed, the result is the collapse of all its states into a single state [EHH01]. Specifically, quantum computing uses qubit instead of bits. Definition 3.14 (qubit): an analog of the computational basis of classical computers with basic states, 0 or 1. Definition 3.14 ( Entanglement ): Is the property of two or more quantum registers referenced to each other. That is, a change in the first register automatically changes the state of the other referenced registers In particular, a 2 qubit quantum computer can be in four states simultaneously, represented by 00, 01, 10, and 11, respectively. Likewise, 3 qubit quantum computer can be in eight states simultaneously, represented by 000, 001, 010, 011, 100, 101, 110, and 111, respectively. In general, qubit can be in states simultaneously. It can be thought as representing the binary expression of integers from 0 to Thus, the quantum computing power doubles for each qubit increment. Since an qubit quantum computer can exist in states simultaneously, it is applicable to perform computations to all register states at once. Thus, a 3 qubit quantum register can either store single numbers from 0 to 7, combinations of numbers, or all numbers that can be represented by binary length of 3. Thus, the superposition considers all the possibilities and as such, a 3 qubit register may look alike (refer to [EHH08]) !!! !!" !"! !"" !"" !"! !!" !!! ! ! These are th e foremost properties of Quantum computers, and in fact, there are a lot more details, but we will be limited to these properties as the basis of our discussion on
92 Shor's algorithm for Integer Factorization, which then follow a polynomial algorithm for DLP and ECDLP. 3.6.2 Quantum Fourier Transform The promise of quantum computing allows such transformations to be accomplished in !"# processes in contrast to classical Discrete Fourier Transform DFT which takes [HH00, IBM01]. When comparing t he correlation between QFT and intractable problems, QFT tolerates solving hidden subgroup problems in an efficient manner [Sho97]. Thus, problems such as IFP should be reduced to a hidden subgroup problem [HH00]. The later approach is a key step in givi ng quantum algorithms an exponential advantage in processing time over classical ones. 3.6.3 Quantum Attack: IFP Quantum factorization algorithm, known as Shor's algorithm, is a probabilistic integer factorization algorithm and is attributed to Peter Shor [Sho97 ]. It takes a composite integer and outputs its prime factors in !"# time and !"# space. Classically, in contrast, the best known factorization algorithm (GNFS) takes the form !"# !"# !"# This running time become s intractable as increases. Shor's algorithm is considered a breakthrough in the cryptographic arena, as it implies that algorithms based on IFP are easily breakable if given a quantum computer with sufficient qubit. In contrast, classical computers usi ng the best known factorization algorithms are exponential in time and complexity increases as the number of bits or increases. The underlying structure of
93 Shor's algorithm is based on modular arithmetic, quantum parallelism, and quantum Fourier transf orm [Sho97]. 22.214.171.124 IFP Reduction to a Hidden Subgroup Problem (HSP) Shor's algorithm relies on taking advantage of the power of quantum computing to solve a hidden subgroup problem. Thus, IFP is not directly solvable, and is instead reduced to a problem of finding an order of an element in the multiplicative group modulo T hat is, finding the least integer such that !"# which is an instance of HSP. The reduction following the given convention in [Shor97] is described as follows: given an efficient algorithm for finding the order of choose a random !"# find its order and compute !"# ! Since !"# ! is the largest number that divides and and ! ! !"# !"# ! is a non trivial divisor of iff is odd or !"# Computing !"# ! takes polynomial time using the Euclidean algorithm, and the remaining task is to find the order in which quantum computing can solve a hidden subgroup problem in polynomial time. 126.96.36.199 Finding a Group's Order and Facto rization The algorithm, as introduced by Shor [Sho97], starts with choosing a power of 2 integer satisfying the equality ! Thereafter, the algorithm randomly chooses an integer relatively prime to hence, !"# ! ends the cl assical part of the algorithm. The quantum part of the algorithm, finds the or der of a group, initializes two entangled quantum registers, input and output The input register
94 will be loaded with the representation of integers. On the other hand, is loaded with !"# for It should be noted that should be qubit and is an qubit to sufficiently load their corresponding values. This yields the output register to have a list of integers of the form !"# for and measuring the register will collapse the output register into one number. The interesting par t is, there is an equal probability of all values in the register and thus, we need to filter the list and peak the probability of one of them that will best serve to find the solution. This unitary transformation (QFT) peaks the proba bility of the desire d integers that are multiple of where is the desired period. Hence, we no longer have an equal superposition of states and QFT raised the probability of values in to be a multiple of Let us denote the value of by Shor's resul t states that ! is obtained with probability dependent on ; the higher value of the more accurate the result. The factors of : for the above procedures, we find the period of the function and as stated in the reduction section, factors of can be obtained by computing !" ! for some relatively prime integer to In case of failure, we repeat the steps with different value of Common scenarios of Shor's algorithm failures result in the factors 1 and and the period ends up being odd. Failures also insure that the equality holds !"# true for the found value of or the QFT is measured to be 0 making the remaining calculations impossible.
95 3.6.4 Other Quantum Attacks: QRP, DLP and ECDLP QRP. It has shown in section 3.2.3, that an efficient solution to the IFP imposes a solution to the QRP Thus, once can factor the modulo of the QRP using Shor's algorithm, and then classically solve the problem using the Legendre symbol for each prime factor DLP. In the same paper [Sho97], Shor introduces an algorithm to solve the DLP with solely two modular exponentiations and two Quantum Fourier transforms ECDLP. J. Proos and C. Zalka introduced the extension of Shor's algorithm for solving the ECDLP over GF( ) [PZ03 ]. Its advantage over Shor's integer factorization and discrete logarithm algorithm, is that it requires 1000 qubit to break 160 EC, a security wise equivalent to 1024 RSA, which requires a 2000 qubit quantum computer [PZ03]. The implementation of the algorithm considers an elliptic curve over GF( ) for some large prime number [PZ03]. Remark: Due to the theoretical relations between the IFP, DLP, and ECDLP, efficient Quantum algorithms to each problem could be introduced as a subsequent to the Shor's algorithm. That is, the DLP could be solved in same paper of the IFP algorithm, and five years later, the ECDLP could be solved efficiently. This is because most of these problems are reduced to a hidden subgroup problem, in which a Quantum Fou rier Transform tolerate solving these problems in !"# 3.6.5 Current State of Practical Quantum Attack on the IFP Using seven spin nuclei in a molecule as quantum bits, experimentation demonstration of the algorithm using quantum computers was conduc ted by a team from IBM to factor !" and prove practical success of Shor's algorithm [IBM01]. For
96 a practical break of current RSA 1024, a quantum computer with 2000 qubit is required [PZ03]. More precisely, the best optimization result of Shor's alg orithm is attributed to S Beauregard, which roughly requires qubits [Bea03]. Yet, the Quantum computers have had a remarkable progress, especially with the first commercially quantum computer with 512 qubits chipset released by the British company D wave [D wave]. 3.6.6 On the Limitations of Quantum Computing Quantum ba sed attacks discussed in this chapter implicate the assumptions made to form the basis of modern public key cryptographic protocols. The demonstration of the quantum polynomial algorithms ability to solve IFP, DLP, and ECDLP makes it clear that existence of quantum computing will render most cryptographic protocols obsolete. Yet, there are several cryptographic protocols that are believed to resist quantum computing based attacks. In the meantime, it is hard to decide whether quantum computing is capable of solving every problem (e.g. !" hard problems). Quantum based algorithms are still in its early ages and it is hard to give an assertion on whether Quantum machines are capable to solve other intractable problems no t correlated to the IFP, DLP, or ECDL P. However, current literature contends that it is unlikely quantum computers can solve so classed several !" hard problems [RC09]. Clearly, more efforts need to be placed on such problems to ensure public key schemes can resist the highest unexpected computing power which have the capability of making standard public key protocols obsolete. The purported security of such algorithms is based on that fact no one could extend Shor's algorithm, nor find a way to break them. This does not imply that the se protocols will remain secure in the future, but shall provide a level of confidence in the current state of quantum computing. It should be emphasized that there
97 is a distinct difference between quantum based and quantum resistant cryptography. The ob jective of both is to confound quantum based attacks, but quantum resistance can be deployed on classical computers while quantum based cryptography is totally based on quantum mechanics. Instances of these quantum resistant cryptographic protocols are su rveyed in [RC09].
9 8 4. Intractable Graph Problems In this chapter, two intractable problems from graph theory are discussed with cryptography mindset These problems are the Graph Isomorphism and Hamiltonian Path of graphs in which we believe they don't exhibit the theoretical relations shown in Chapter 3 and are at least as hard as the Integer Factorization problem. Unlike the Integer Factorization problem, the general cases of these problems remain unsolved unde r the promise of Quantum computing. This fact motivates us to consider these problems and provide evidentiary reasons as to its suitability for the design of cryptographic primitives. 4.1 Preliminaries In a mathematician's terminology, a simple graph is a set of vertices coupled with a set of edges which contains unordered pairs or !" of elements i.e., ! In the case the set is restricted to ordered pairs, then the graph is called a directe d graph and an edge !" !" In graphical terminology, circles represent the vertices of a graph and are connected by lines, or edges. We often denote by the number of vertices, or the order of a given graph and as the size of The de gree of a given vertex is the size of the set of its neighbors which is defined to be the set of vertices in which is an endpoint i.e. ! ! If then we can start a walk or path starting from through a sequence of vertices depending on the connectivity of a graph where each vertex represents a step in the path. The path is often denoted by a sequence of vertices ! Edges of a graph can be either weighted or non weighted, th at is, there exists labels on the edge written as In terms of data
99 structure, any given graph can be represented by an adjacency matrix. That is, an adjacency matrix ! for a graph with ! is a ma trix for which if ! and 0 otherwise. In the case the graph is weighted, hence, each edge is labeled such that and the matrix entry ! !" for !" is labeled on the edge from vertex to The adjace ncy matrix provides us with favorable efficiency advantages and ease of computation. For instance, the degree, or number of neighbors, of vertex ! is the number of non zero entries in the row of It also supplements us with many other matrix operations and properties, as opposed to other data structures such as an adjacency list. 4. 2 Graph Isomorphism Problem (GIP) In graph theory, an isomorphism between two graphs and is defined by an edge preserving bijection ! such that ! iff ! More precisely, assume two graphs ! and ! with their corresponding adjacency matrices and such that The isomorphism of and can be defined as follow: Definition 4.1 (Graph Isomorphism) : The graph is isomorphic to denoted by iff one of the following conditions is satisfied: 1. There exists a permutation from the symmetric group such that for every e dge ! iff ! ! 2. There exists a permutation matrix such that !" Hence, !" ! iff ! !
100 The hardness of the graph isomorphism problem is based on the lack of an efficient algorithms which yields the need to search all elements of a symmetric group, which contains distinct permutations of elements (refer to claim 4) These elements can be represented the list of vertices of a given graph Claim 4.1: G iven a graph of order there exists possible permutations of vertex labels of Proof: Let us assume we have a recursive function to generate all possible permutations of a given graph with order In the first run of we have possible choices. If we choose one vertex, then our possibilities are reduced to In the second run, chooses the next vertex, which leave to have possible choices for the next call. The function keeps running until the base case is reached, where the base case is a list of choices with only one vertex. Overall, the total number of possible permutations or arrangements of the list of vertices is ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! The trivial solution to the GIP is to try all possible permutation matrices to find if there exists such that !" The number of steps apparently grows in parallel to the number of vertices of the given graphs. The brute force attack technique gets more intense and in feasible when considering the instance of the GIP, the subgrap h isomorphism problem sGIP The sGIP is proven to rely in the complexity class !" complete and both GIP and sGIP are defined as follow:
101 Definition 4.2 (Graph Isomorphism Problem GIP): given two graphs and decide if Definition 4.3 (Sub Graph Isomorphism Problem sGIP): given two graphs and decide if !"# where !"# is another graph derived by taking a subset of vertices and edges of Unlike the GIP, the brute force technique on sGIP requires ! possible checks, where is the order of the smaller graph. That is, such methodology needs to generate a set of graphs with order starting from 1 up to the order of the second graph. These problems are classified as decision problems, and thus, the translation of these problems into a search problem class is con sidered at least as hard as its decisional translation as stated in theorem 1.1 in section 1.3.1. The search instance of both the GIP and sGIP is to find the permutation matrix that proves the isomorphism given two graphs. 4. 2 .1 Complexity of GIP, sGIP T he complexity class of the GIP remains unknown, and based on disappointing results, it is not believed to be in class and has not been proven to be in !" class "# $ %&' This is similar to the case of the IFP, and historical results provide evident reas ons to trust the hardness of such problems and have considered it for cryptographic applications. On the other hand, the sGIP has proven to be reduced to the well known !" complete problem; the Clique problem The Clique problem is to find if there exists a complete sub graph in a given graph. The sGIP is equivalent to the Clique problem in that the subgraph isomorphism is given two graphs, one complete graph and other larger graph to test the subgraph isomorp hism. There are special cases of graphs in which there
102 exists a polynomial time algorithm, which includes Planar [DLN09], Tree s [Kel57], Interval s [KG79], and bounded degree or genus graphs [JK03]. Undoubtedly, these solutions don't impose any certain evid ence to the simplicity of the GIP, as it is the case of weak prime numbers in the IFP Thus, they can be eliminated in a similar fashion to weak primes demonstrated in Chapter 3, in which efficient factorization algorithms exist. 4. 3 Hamiltonian Path Problem (HPP) A Hamiltonian path is a path that visits every vertex of a graph exactly once. That is, given a graph with a starting vertex a path traverses from to one of 's neighbors visiting every vertex of Denote the Hamiltonian path !" ! to be an ordered set of vertices forming the path, and let be the vertex !" at the position. Then, for every ! and !" for the problem of the Hamiltonian path is defined as foll ows: Definition 4.4 (Hamiltonian Path Problem HPP) : given a graph decide if there exists a path that traverses through all vertices in exactly once. The problem has another variant, the Hamiltonian Cycle problem It is similar except that the path needs to connect back to the starting vertex. That is, for a Hamiltonian path !" ! there is an extra edge that converts the path into a cycle. 4. 3 1 Complexity of the HPP It is known that both problems are reducible to the well known !" com plete problem, the Traveling Salesman problem, and thus, HPP is also !" complete [GJ79].
103 The trivial solution to the HPP is to brute force all possibilities that count to be where is the number of vertices of a graph. Claim 4.2: Given a graph of order there exists possible Hamilton ian p aths. Proof: Let us denote !" ! It is easy to see that HP can be permuted as an instant of element of the symmetric group where the elements are the vertex labels of a given graph As seen in claim 4.1, there are possible permutations of HP and with each permutation, one needs to check if each ! for Claim 4.2 is not limited to complete graphs. This claim is based on the hardness of the Hamiltonian path problem, which is originally a decision problem. The decision problem of the Hamiltonian path is to decide whether a given graph contains a Hamiltonian path or not. In order to decide, one needs to take the list of vertices ! of a given graph and find all possible permutations. For each permutation, we can check whether a particular permutation forms a Hamiltonian path. 4.4 One waynes s 4.4. 1 The GIP into a One way Function One way functions form the basis of modern public key cryptographic protocols. As discussed in Chapter 1, a one way function is a function that is easy to compute for every given input however, it is c omputationally infeasible to reverse the computation. It is apparent that the GIP may form a one way function as follows. Let Alice be a legitimate user, who uniformly at random generates a graph of order Since the order of is there exist s a symmetric group with distinct
104 permutations of the elements of Alice may generate isomorphic graphs by simply enumerating the given graph with a random matrix such that which is described by the following ! If both and are considered secret quantities of the function, and Eve obtains then it is infeasible to recover In a different manner, let both and be made public, such that the secret quantity is the permutation matrix The later case is an instance of the GIP such that Eve needs to discover the isomorphism of both and i.e. the permutation matrix among distinct possibilities. In addition, the complexity of generating is measured through two matrix mu ltiplications, ! and ! The best known two matrices multiplication is [Wil11] that runs in for some !"#" Yet, the above multiplication is a special case such that all that is needed is to swap the rows with columns of in case of the left permutation matrix multiplication and vice versa for the right permutation matrix multiplication ! The complexity is dependent on the number of rows, and as discussed in section 4.1, each r ow and column corresponds to a vertex of the graph. Hence, the complexity is bounded by the order of a graph Yet, due to the trivial operation and large order of the symmetric group a brute force on the the GIP with a graph with !"" is equivalent to a brute force to factor an RSA 2048 key. That is, 300! and RSA 2048 are both approximately 617 decimal digits. 4.4.2 The HPP into a One way Function The HPP can form a one way function if certain care is taken during the construction ph ase of a graph. The idea is to first form Hamiltonian path from a dis
105 connected graph and stores the information into a matrix or for storage advantage, into a vector or list. The next step is to randomly draw the remaining edges of a graph to form a g raph with its adjacent matrix Let be known to everyone, including Eve, while the legitimate user, such as Alice, only knows the path If the path is considered a secret quantity, that is, a key or plaintext, the graph can be thought as a ciphertext. For Eve to disclose such information, Eve needs to discover the Hamiltonian path from which is believed to be hard, as it is an instance of the search version of the HPP. On the other hand, constructing the graph by Alice took the amount of time bounded by the number of edges and space This satisfies the property of the one way function, in which it is easy to create a Hamiltonian path by a legitimate us er, but computationally infeasible to reverse the computation by Eve. 4. 5 Hiding Secrets in Graphs It has been shown that both problems, the GIP and HPP, can possibly form the basis of new cryptographic primitives Yet, for cryptographic purposes, the one wayness must involve a trapdoor element such that it can be reversed if legitimate users obtain the trapdoor. Moreover, a key exchange methodology based on the hardness of these problems remains unsolved. In the f ollowing, we describe only a methodology to hide secrets in graphs with the assumption that an oracle where stated exists. The methodology is based on the GIP in conjunction with the HPP. 4.5. 1 Step 1 : Constructing a Hamiltonian Path Let be the message to be enciphered such that ! ! partitioned into data bl ocks with proper padding on the !" data block and for The next step is to represent these data blocks as a Hamiltonian
106 path. This can be ac hieved by constructing a disconnected graph of vertices ! The objective of starting with a disconnected graph is to facilitate the construction of a Hamiltonian path. It can be constructed by adding an edge with label for an integer ! For example, consider the message UCDENVER to be enciphered into data blocks each of one letter. The typical numerical representation starts with 10 for the letter A' and increment by 1 up to Z' denoted by the integer 35. Hence, our message "UCDENVER" can be rewritten as !" !" !" !" !" !" !" !" The graph should contains 9 vertices as illustrated in Figure 5. Figure 5: The message "UCDENVER" enciphered into numerical formats and constructs a Hamiltonian path following the description of section 4.6.1. 4.5.2 Step 2: Diffusion with Random Edges In step 1, the message is obvious and no thing hides the message. However, we simply constructed a Hamiltonian path that represents a message. The objective of this step is to add more level of diffusion to the graph and hide the Hamiltonian path of the message. This can be addressed by adding more edges between the ver tices. Once the graph is observed by Eve, the graph should doubt s Eve on the number of the Hamiltonian
107 paths that exist in the graph and enforce her to take the trivial way, hence, generating all possible paths by permuting all of the vertices. Figure 6 illustrates an example of resulting graph of this step considering the first graph in Figure 5. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!! !!!!! Figure 6: A dding more random edges to the graph diffuses the Hamiltonian path representing the message. 4.5.3 Step 3: Confusion with the GIP The resulted graph, Figure 6, from the previous step adds a level of obscuration to the message represented by a Hamiltonian path amongst possible paths. However, the aim of this prototype is to serve as a public key scheme in which the description of the algorithm should be a public knowledge (refer to K irchhoff's principle in Chapter 1). Hence, Eve knows the path starts from the vertex and seq uentially traverse through ! to recover the message. In order to address this, we can generate a graph isomorphic to and retain the Hamiltonian path of the message in an enumerated order. Theorem 4.1: Let ! be a Hamiltonian path on then for every isomorphic graph such that !" we have
108 Proof: Based on the definition of graph isomorphism, an isomorphic graph is obtained by an edge preserving bijection such that an edge !" iff ! for some permutation Hence, a Hamiltonian path is given by ! iff ! ! ! Security Remark: The hidden message is now a Hamiltonian path on For someone to disclose the message without the knowledge of the secret quantity the isomorphism of one needs to do the following. First, either generate all Hamiltonian paths of in which one of them represents the message Second, find the permutation that maps (GIP) and recover the message from the graph by traversing the Hamiltonian path ! In both cases, it is believed to be hard as the number of possibilities increases to where is th e order of This step can be illustrated by randomly generating a permutation to serve as a bijection mapping of the graph vertices. For example, ! ! and compute !" of the graph on Figure 6 generated from step 2. That is, a bijection mapping of all vertices using the random permutation such that ! For example, the vertex is mapped to under the permutation The result is a diffused graph with a hidden Hamiltonian path, the secret message as demonstrated in Figure 7.
109 Figure 7 : The message !" !" !" !" !" !" !" !" was initially enciphered into the Hamiltonian path ! ! in which then Hashed to th e Hamiltonian path ! ! 4. 6 Evaluation and Security Analysis The current state of the proposed approach may serve in a verity of applications. These applications exclude the objective of public key schemes, which serve applications involving a communication over unsecured channel. The primarily reason for this limitation is the need to establish a key agreement scheme base d on either the Graph Isomorphism, Hamiltonian path or both problems to retain the security assumptions based on their intractability. Therefore, individuals may use this scheme to encipher files and protect passwords, with the assumption legitimate users generate the permutation at random and retain it secured against unauthorized access. In the reminder of this chapter, we discuss the security of this scheme and evaluate its computational complexity with consideration to these applicable applica tions, and the same should apply if a key agreement scheme based on the same methodology is introduced in the future.
110 4. 6 .1 Computational Analysis and One wayness Property Computational complexity of this scheme satisfies the requirements for the design of Hash functions discussed in Chapter 1. The Hash function should be easily computable in one direction, and infeasible on the other direction. That is, a legitimate user calls the introduced function with inputs of a message and a random permutation to hash the message into a ciphertext. The legitimate users first split the message into data blocks and build a graph in time a function of the number of vertices, or equivalently, the number of data blocks. This takes a running time equals to ! The next step involves adding a random set of edges to the graph, followed by permuting all vertices, which counts to be ! From an implementation prospective, these operations are trivial and take a very short time and CPU burs ts ( !!!" !"#$% as it is mainly a write operation in memory. In addition, the number of vertices is suggested to be 300 vertices to have an equivalent security of RSA 2048 (details in the next section ). On the other hand, Eve obtains a ciphertext graph, which contains the secret message hidden into a random Hamiltonian path. That is, Eve goal is to discover the Hamiltonian path s of or better, discover the secret permutation In both cases, Eve's best choice is to find all possible Hamiltonian paths in the enciphered graph and conceal the message. However, there are two possibilities that challenge Eve to conceal the message. First, there are possible Hamiltonian paths in the Graph which poses a computational infeasibility on Eve capability and resources as the number of possible Hamiltonian paths grow exponentially with respect to This imbalance computation between legitimate users and Eve is the basis of this scheme s security Second, assume Eve could discover many Hamiltonian paths, and the next challenge is, to choose
111 amongst many Hamiltonian paths. That is, this Hash function has a sense of non deterministic property. The non deterministic property is defined to th at a message is possibly enciphered into multiple ciphertexts (one to many relations). This property exists only on ElGamal cryptosystem, which provides an advantage of security over the alternative RSA. That is the function satisfies the following one wayness property ! for a message ciphertext and a secret permutation There are additional properties similar to the above in the case a pair of public private keys are established. 4. 6 .2 Frequency Analysis Frequency analysis is an infeasible scheme of attack on the proposed scheme and even harder than directly searching for a Hamiltonian path This is due to the fact that Eve needs to first conduct a frequency analysis on plaintext messages to find an estimate of occurrence to all possible words that can be generated with characters. This can be reformulated by conducting a frequency analys is on all unordered subsets of elements, where is the number of characters on a given language. Since there are possible un ordered subsets of characters, and even more ordered ones 4. 6 3 Linear and Differential Analysis Linear cryptanalysis is not applicable to this scheme. The scheme follows a linearity privation structure. Yet, the substitution differential cryptanalysis is applicable, and fortunately, it appears to be at least as hard as the graph isomorphism problem.
112 Let us consider the proposed function is treated as a black box model and Eve is capable of feeding it with input (messages) and obtains the corresponding ciphertext. The aim of this is to discover the secret quantity of the model, the permutation Let us assume Eve generates a list of ! ! Eve only observes t he input graph and the corresponding isomorphic graph such that !" For Eve to have a sense of the difference between each feed to the black box m ode, Eve needs to know the isomorphism between and 4. 6.4 Collision Resistant The description of the proposed function assumes the way a single message is encrypted. It is hard to analyze the collision resistant of the function with consideration to a single message. This is due to that collision resistant property defined as follow !" !"# ! needs an illustration of encrypting an elastic number of messages, which was not considered for the initial description in section 4.6. However, there is a great property regarding the relation between a message and a Hamiltonian path. Since each path is a uniq ue sequence of vertices, and each path represent s a message, then each message corresponds to a unique enciphered sequence. That is, the probability of two different messages to hash to the same digest is exactly 0. Since the collision resistant property i s satisfied, this implies that the second preimage resistant property
113 !"#$ !" ! Thus far, we know in order to maintain the property of collision resistant; we need to represent each message into a unique path. This can possibly be ac hieved in two ways depending on the scenario of applications: Case 1: Let us say all messages are ready to be encrypted. This is a simple case in which we follow the same conventions described in section 6. The messages are separated by a message distinguishing block value such as the integer 99. Case 2: The messages are ready and encrypted at time and the message arrives at time In this case, we can expand both the graph and the p ermutation matrix to be of the size ! The size of all messages is equals to the number of data blocks needed to represent the message. Subsequently, continue on the previously created Hamiltonian path to connect the newly created vert ices resulted from the expansion of the graph followed by re permuting the new graph using the secret permutation This method involves a dynamic behavior of graph is it is expected to expand in parallel with the number of messages. The second suggeste d way, and possible more efficient, is to re use the first Hamiltonian path for every created message. That is, the new data blocks are embedded to the weights of the Hamiltonian path on the previous round. In the later scenario, a revision on the cipherte xt is necessary since all messages are enciphered into the same Hamiltonian path. An example of suggestion is to introduce a message identifier to each encrypted message such that is represented by tuple Moreover this imposes another security concern as the increase in length of edges weights releases information on the Hamiltonian path. In order to do this, one need to simultaneously imitate the process on every non
114 Hamiltonian path edge to increase its length i n the same manner with random values. Figure 8 illustrates the second case of encrypting multiple messages without re generating different graph for each message. Figure 8 : The message UCDENVER !" !" !" !" !" !" !" was encrypted at time The new message MRHUSAIN !! !" !" !" !" !" !" !" arrives at time and is encrypted to the same Hamiltonian path. 4. 6 5 Avalanche Property The avalanche property states that a Hash function should produce a digest different by approximately 50% for two different inputs differing by one bit. The structure of this scheme works differently in a sense of hashing a given input. The study of Avalan che property is to be conducted with depende ncy on the permutation used to for the bijective mapping In order to preserve the Avalanche property, and have every given input to Hash to a completely different value, one need to carefully choose th e permutation The ideal choice of to have each vertex maps to a different vertex such that ! for any (derangement permutations) This ensures !" !" and thus the Hamiltonian paths HP and HP' such that !" differ by
115 a 100%. The restriction on the permutation does not limit the count of possible permutations, and still grow exponentially. For example, ! ! is a derangement permutation while ! ! is not a derangement permutation since and This restriction on the choice of the permutation is to ensure that every vertex is mapped to a different vertex, and thus, poses more challenge on guessing the secret permutation. Consider the example demonstrated in Fig ure 7. The Hamiltonian path mapping ! ! ! ! by a strong permutation. However, the following mapping ! ! ! ! discloses portions of the secret permutation to Eve, in which Eve obtains the following from the ciphertext ! ! and may heuristically guesses the values of ! and as they are, by construction, are neighbors of the vertices and That is, considering the above example, an attacker could possibly recover the following ! ! in time bounded by !"# !"# The number of derangement permutations is bounded to the limit
116 4. 6.6 Parameters Choice and Specifications The parameters and specification may vary depending on the details of the design. Up to the current state of the prototype, we find that permutations must satisfy the strong property and uniqueness discussed in the previous section. In addi tion, the basis of security is dependent on the Graph Isomorphism and Hamiltonian path problems, and thus, the size of the Graph must be large enough to confound random and brute force attacks. That is, we propose few elementary 4. 6.6.1 The Graph's Order The parameters such as the size and degree of graph plays a role on the difficulty of solving the presumed difficulty of the Graph Isomorphism as well as the Hamiltonian path problem which form the basis of the proposed prototype. The order of the graph forms the domain in which random and brute force like attacks succeed. The number of possibilities for both the Graph Isomorphism and Hamiltonian path is where is the number of vertices of the graph. The number of vertices should be no less th an 300 to achieve the equivalent strength of RSA 2048. The approximation of key sizes is illustrated in Figure 9
117 Figure 9 : Comparison between the RSA key sizes and the required number of vertices to achieve an equivalent threat resistant (Brute force and Random attacks) obtained by RSA cryptosystems. 4. 6.6.2 The Graph's Size The size of graph, or the number of edges, is bounded by for a graph with vertices with the assumption that no self loops exist The edges are added at random and need to be as many as possible to increase the difficulty of finding a Hamiltonian path, and thus, recovering the message by Eve. There is no restriction on the number of edge, however, each vertex must have a degree large r than 2. In the case a give vertex has a degree 1, then the in coming edge connecting the vertex to a graph is surly an edge that is part of the Hamiltonian path by the definition of a Hamiltonian path. In addition, a vertex if degree 2 assures us that the vertex is a mid point of partial of the Hamiltonian path and both in coming and leaving edges are parts of the Hamiltonian path. This allows Eve to reconstructs partials of the path and reduce the difficulty of 1 2 3 4 5 6 7 8 9 10 RSA-bits 120 140 155 160 576 640 768 1024 2048 4096 RSA-decimals 36 42 47 48 173 193 231 308 616 1233 |V(G)| 32 36 39 40 107 117 135 170 300 535 0 500 1000 1500 2000 2500 3000 3500 4000 4500 size RSA key Sizes vs. Graph's Order
118 finding the Hamiltonian path if many ver tices with degree less than 2 exists. Illustration of this point is given in Figure 11 Figure 11 : The graph on the left is obtain by Eve and Eve finds that the vertices and don't satisfy the vertex degree requi rement and assumes the bolded edges to be part of the Hamiltonian path. That is, he initially recovered 3 out of 8 edges of the secret Hamiltonian path and brute force the remaining edges to get the Hamiltonian path on the right side graph. Remark: Edges labels are not restricted to be unique. A single graph can have multiple edges with the same label since the secret is mainly the sequence of vertices in which they need to be unique. That is, there is no two vertices with the same label in the grap h to eliminate the confusion on traversing the secret path. 4.6.7 Discussion and Further Work Cryptosystems include various parts such as key agreement, encryption and decryption procedures to obscure messages. The proposed scheme only addresses the part of encryption decryption in which commonly based on a Hash function. This Hamiltonian and Graph Isomorphism Hash function seems to be a perfect candidate for future designs due to its difficulty even with present of Quantum computing [EH99,
119 MRV08, KK07, CW05] The current state of this prototype needs to address two problems. First, a key agreement of permutations to serve as a public key cryptography scheme. Second, it is not yet clear if it is possible to encrypt multiple messages that are present at different time frame. It is obviously each message can be encrypted using a different graph, but this may pose efficiency concerns by generating multiple graphs Furthermore, the current structure of this problem can be modeled as a symmetric group rather than Graph theory. This especially true when it comes to implementation, one does not need to consider creating vertices objects to implement this sche me. Yet, the consideration of graph theory is from two points of view. First, relating the scheme to a well known problem that has been investigated by both Quantum and Classical based algorithms researchers provide a high confident on its security. Second considering as a graph theoretical problem may increases he domain of our capability to embed farther graph theoreticals for digital signatu res and key agreements through g raph colorings as observed in [KK03].
120 REFERENCES [Adl79] Adleman, Leonard. A subexponential algorithm for the discrete logarithm problem with applications to cryptography. In Foundations of Computer Science, 1979, 20th Annual Symposium on pp. 55 60. IEEE, 1979. [AKS02] Agrawal, Manindra, Neeraj Kayal, and Nitin Saxena. PRIMES is in P. Annals of mathematics (2002): 781 793. [AM09] Aggarwal, Divesh, and Ueli Maurer. "Breaking RSA generically is equivalent to factoring." In Advances in Cryptology EUROCRYPT 2009 pp. 36 53. Springer Berlin Heidelberg, 2009. [Bac84] Bach, Eric. Discrete logarithms and factoring Berkeley: Computer Science Division, University of California, 1984. [BBS86] Blum, Lenore, Manuel Blum, and Mike Shub. A simple unpredictable pseudo random number generator. SIAM Journal on computing 15, no. 2 (1986): 364 383. [Bea03] Beauregard, Stephane. Circuit for Shor's algorithm using 2n+ 3 qubits. arXiv preprint quant ph/0205095 (2003). [Ber07] Bernstein, Daniel J. Better price performance ratios for generalized birthday attacks. I n Workshop Record of SHARCS vol. 7, p. 160. 2007. [Bon98] Boneh, Dan, and Ramarathnam Venkatesan. Breaking RSA may be easier than factoring. preprint 252 (1998). [BR11] Barker, Elaine, and Allen Roginsky Transitions: Recommendation for transitioning the use of cryptographic algorithms and key lengths. NIST Special Publication 800 (2011): 131A. [Bre06] R. P. Brent, Uncertainty can be better than certainty: some algorithms for primality testing, 2006. [Bro08] Brown, Daniel RL. Breaking RSA May Be As Difficult As Factoring. IACR Cryptology ePrint Archive 2005 (2005): 380. [BS91] Biham, Eli, and Adi Shamir. Differential cryptanalysis of DES like cryptosystems. Journal of CRYPTOLOGY 4, no. 1 (1991): 3 72. [Car02] Carella, Nelson A. On the Theory of Integers Factorization. (2002). [CBP06] De Canniere, Christophe, Alex Biryukov, and Bart Preneel. An
121 introduction to block cipher cryptanalysis. Proceedings of the IEEE 94, no. 2 (2006): 346 356. [CLRS01] Leiserson, Charles E., Ronald L. Rivest, and Clifford Stein. Introduction to algorithms Edited by Thomas H. Cormen. The MIT press, 2001. [CMF12] Conrad, Eric, Seth Misenar, and Joshua Feldman. CISSP study guide Ac cess Online via Elsevier, 2012. [CSQ07] Collard, Baudoin, F X. Standaert, and Jean Jacques Quisquater. "Improving the time complexity of matsui's linear cryptanalysis." In Information Security and Cryptology ICISC 2007 pp. 77 88. Springer Berlin Heidelberg, 2007. [CW05] Childs, Andrew M., and Pawel Wocjan. "On the quantum hardness of solving isomorphism problems as nonabelian hidden shift problems." arXiv preprint quant ph/0510185 (2005). [Dam89] DamgÂŒrd, Ivan Bjerre A design principle for hash functions. In Advances in Cryptology CRYPTO'89 Proceedings pp. 416 427. Springer New York, 1990. [DDL94] Denny, Thomas, Bruce Dodson, Arjen K. Lenstra, and Mark S. Manasse. On the factorization of RSA 120. In Advances in Cr yptology CRYPTO'93 pp. 166 174. Springer Berlin Heidelberg, 1994. [DH76] Diffie, Whitfield, and Martin Hellman. New directions in cryptography. Information Theory, IEEE Transactions on 22, no. 6 (1976): 644 654. [DLN09] Datta, Samir, Nutan Limaye, Prajakta Nimbhorkar, Thomas Thierauf, and Fabian Wagner. Planar graph isomorphism is in log space. In Computational Complexity, 2009. CCC'09. 24th Annual IEEE Conference on pp. 203 214. IEEE, 2009. [Dob98] Dobbertin, Hans. Cryptanalysis of MD4. In Fast Software Encryption pp. 53 69. Springer Berlin Heidelberg, 1996. [EF12] Campbell, J., and R. J. Easter. Annex C: Approved Random Number Generators for FIPS PUB 140 2, 2012. [EH01] Ekert, Artur, P. M. Hayden, and Hitoshi Inamori "Basic concepts in quantum computation." In Coherent atomic matter waves pp. 661 701. Springer Berlin Heidelberg, 2001. [EH99] Ettinger, Mark, and Peter Hoyer. A quantum observable for the graph isomorphism problem. arXiv preprint quant ph/9901029 (19 99).
122 [Elg84] ElGamal, Taher. A public key cryptosystem and a signature scheme based on discrete logarithms. Information Theory, IEEE Transactions on 31, no. 4 (1985): 469 472. [EP00] Rieffel, Eleanor, and Wolfgang Polak An introduction to quantum computing for non physicists. ACM Computing Surveys 32, no. 3 (2000): 300 335. [EPFL10] Kleinjung, Thorsten, Kazumaro Aoki, Jens Franke, Arjen K. Lenstra, Emmanuel ThomÂŽ, Joppe W. Bos, Pierrick Gaudry et al. Factorization of a 768 bit RSA modulus. In Advances in Cryptology CRYPTO 2010 pp. 333 350. Springer Berlin Heidelberg, 2010. [Fey82] Feynman, Richard P. Simulating physics with computers. International journal of theoretical physics 21, no. 6 (1982): 467 488. [FS86] Feigenbaum, Joan, and Alejandro A. SchÂŠffer. "Recognizing composite graphs is equivalent to testing graph isomorphism." SIAM Journal on Computing 15, no. 2 (1986): 619 627. [FSK12] Ferguson, Niels, Bruce Schneier, and Tadayoshi Kohno. Cryptography engine ering: design principles and practical applications John Wiley & Sons, 2012. [GB08] Bellare, Mihir, and Shafi Goldwasser. Lecture notes on cryptography. (2008). [Gib91] Gibson, J. K. Discrete logarithm hash function that is collision free and one way. Computers and Digital Techniques, IEE Proceedings E 138, no. 6 (1991): 407 410. [ G J 79 ] Gary, Michael R., and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP completeness. (1979). [GM84] Goldwasser, Shafi, and Silvio Micali Probabilistic encryption. Journal of computer and system sciences 28, no. 2 (1984): 270 299. [Gol08] Goldreich, Oded. Computational complexity: a conceptual perspective. ACM SIGACT News 39, no. 3 (2008): 35 39. [Gol98] Goldreich, Oded. Foundations of cryptography. (1998): 3. [Gor93] Gordon, Daniel M. "Discrete Logarithms in GF(P) Using the Number Field Sieve." SIAM Journal on Discrete Mathematics 6, no. 1 (1993): 124 138.
123 [Gre12] Childers, Greg. Factorization of a 1061 bit number by the Special Number Field Sieve. IACR Cryptology ePrint Archive 2012 (2012): 444. [HH00] Hales, Lisa, and Sean Hallgren. An improved quantum Fourier transform algorithm and applications. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on pp. 515 525. IEEE, 2000. [HIK11] Hammack, Richard, Wilfried Imrich, and Sandi Klav!ar. Handbook of product graphs Taylor & Francis US, 2011. [Hor73] Feistel Horst. Cryptography and computer privacy. Scientific american 228 (1973): 15 23. [IBM01] Vandersypen, Lieven MK, Matthias Steffen, Gregory Breyta, Costantino S. Yannoni, Mark H. Sherwood, and Isaac L. Chuang. Experimental realization of Shor's quantum factoring algorithm using nuclear magnetic resonance. Nature 414, no. 6866 (2001): 883 887. [ITU13] The world in 2013. ITU. Web. 30 Sept. 2013. < http://www.itu.int/en/ITU D/Statistics/Documents/facts/ICTFactsFigures2013.pdf > [JK03] Jenner, Bi rgit, Johannes KÂšbler, Pierre McKenzie, and Jacobo TorÂ‡n. Completeness results for graph isomorphism. Journal of Computer and System Sciences 66, no. 3 (2003): 549 566. [Kar63] A. Karatsuba and Yu. Ofman (1962). Multiplication of Many Digital Numbers by Automatic Computers. Proceedings of the USSR Academy of Sciences 145 : 293 294. Translation in Physics Doklady, 7 (1963), pp. 595 596 [KCH99] Kim, Hwan Joon, Jung Hee Cheon, and Sang Geun Hahn. Elliptic curve lifting problem and its applications. PROCEEDINGS JAPAN ACADEMY SERIES A MATHEMATICAL SCIENCES 75, no. 9 (1999): 166 169. [Kel57] Kelly, Paul J. A congruence theorem for trees. Pacific J. Math 7, no. 961 968 (1957). [Ker78] Kerckhoffs, Auguste. La cryptographie militaire University Microf ilms, 1978. [KG79] Lueker, George S., and Kellogg S. Booth. A linear time algorithm for deciding interval graph isomorphism. Journal of the ACM (JACM) 26,
124 no. 2 (1979): 183 195. [Kir11] Kirchner, Paul. Improved Generalized Birthday Attack. IACR Cryptology ePrint Archive 2011 (2011): 377. [KK0 3 ] Kulesza, Kamil, and Zbigniew Kotulski. Secret sharing for n colorable graphs with applicat ion to public key cryptography. arXiv preprint cs/0310053 (2003). [KK07] Kashefi, Elham, and Iordanis Kerenidis. Statistical Zero Knowledge and quantum one way functions. Theoretical computer science 378, no. 1 (2007): 101 116. [Kle10] Kleinjung, Thorsten, Kazumaro Aoki, Jens Franke, Arjen K. Lenstra, Emmanuel ThomÂŽ, Joppe W. Bos, Pierrick Gaudry et al. Factorization of a 768 bit RSA modulus. In Advances in Cryptology CRYPTO 2010 pp. 333 350. Springer Berlin Heidelberg, 2010. [Kob87] Koblitz, Neal. Elliptic curve cryptosystems. Mathematics of computation 48, no. 177 (1987): 203 209. [Lev96] Le Veque, William J. Fundamentals of number theory Courier Dover Publications, 1996. [Lewis_Book] Essential of Theoritical Computer science, F.D. Lewis, University of Kentucky. [LHABKW12 ] Lenstra, Arjen K., James P. Hughes, Maxime Augier, Joppe W. Bos Thorsten Kleinjung, and Christophe Wachter. Ron was wrong, Whit is right. IACR Cryptology ePrint Archive 2012 (2012): 64. [LS84] Liddell and Scott's Greek English Lexicon. Oxford University Press. (1984) [Man80] Manin, Yu I. Vychislimoe i nevychislim oe (Computable and Noncomputable), Moscow: Sov. (1980). [MD01] Dworkin, Morris. Recommendation for block cipher modes of operation. methods and techniques No. NIST SP 800 38A. NATIONAL INST OF STANDARDS AND TECHNOLOGY GAITHERSBURG MD COMPUTER SECURITY DIV, 2001. [Mer79] Merkle, Ralph Charles. Secrecy, authentication, and public key systems. (1979). [Mil85] Miller, Victor S. Use of elliptic curves in cryptography. In Advances in
125 Cryptology CRYPTO'85 Proceedings pp. 417 426. Springer Berlin Heidelber g, 1985. [Mil86] Miller, Victor S. Use of elliptic curves in cryptography. In Advances in Cryptology CRYPTO'85 Proceedings pp. 417 426. Springer Berlin Heidelberg, 1986. [Mir05] Mironov, Ilya. Hash functions: Theory, attacks, and applications. Microsoft Research, Silicon Valley Campus. Noviembre de (2005). [Mit03] Public key encryption using block ciphers Chris J. Mitchell. Technical Report. RHUL MA 2003 6. 9 September 2003. Royal Holloway. University of London [Mon85] Montgomery, Peter L. Modular multiplication without trial division. Mathematics of computation 44, no. 170 (1985): 519 521. [MOV10] Menezes, Alfred J., Paul C. Van Oorschot, and Scott A. Vanstone. Handbook of applied cryptography CRC press, 2010. [MOV93] Menezes, Alfred J., Tatsuaki Okamoto, and Scott A. Vanstone. Reducing elliptic curve logarithms to logarithms in a finite field. Information Theory, IEEE Transactions on 39, no. 5 (1993): 1639 1646. [MRV08] Moore, Cristopher, Alexander Russell, a nd Umesh Vazirani. A classical one way function to confound quantum adversaries. arXiv preprint quant ph/0701115 (2008). [Nan09] Nandi, Mridul. Characterizing padding rules of MD hash functions preserving collision security. In Information Security and Privacy pp. 171 184. Springer Berlin Heidelberg, 2009. [Ngu98] Nguyen, Phong. A Montgomery like square root for the number field sieve. In Algorithmic Number Theory pp. 151 168. Springer Berlin Heidelberg, 1998. [NSA] The Case for Elliptic Curve Cryp tography NSA/CSS. NSA Web. 30 Sept. 2013. < http://www.nsa.gov/business/programs/elliptic_curve.shtml > [PH78] Pohlig, Stephen, and Martin Hellman. "An improved algorithm for comp uting logarithms over GP(p)" Information Theory, IEEE Transactions on 24, no. 1 (1978): 106 110. [Pin08] Pinch, Richard GE. The Carmichael numbers up to 10" Mathematics of Computation 61, no. 203 (2008): 381 391.
126 [PO96] ePERALTA, Ren, and Eiji Okamoto. Some combinatorial problems of importance to cryptography. In Proceedings of the IEICE 1996 Symposium on Cryptography and Information Security, Japan 1996. [Pre03] Preneel, Bart. Analysis and design of cryptographic hash functions. PhD diss., Katholieke Universiteit te Leuven, 2003. [PZ03] Proos, John, and Christof Zalka. Shor's discrete logarithm quantum algorithm for elliptic curves. arXiv preprint quant ph/0301141 (2003). [Rab79] Rabin, Michael O. Digitalized signatures and public key functions as intractable as factorization. (1979). [RC09] Perlner, Ray A., and David A. Cooper. Quantum resistant public key cryptography: a survey. In Proceedings of the 8th Symposium on Identity and Trust on the Internet pp. 85 93. ACM, 2009. [Riv92] Rivest, Ronald. "The MD4 message digest algorithm." (1992). [Riv92] Rivest, Ronald L. "RFC 1321: The MD5 message digest algorithm." (1992): 20 21. [Riv97] Rivest, Ronald L. All or nothing encryption and the package transform. In Fast Software Encryption pp. 210 218. Springer Berlin Heidelberg, 1997. [RS04] Rogaway, Phillip, and Thomas Shrimpton. Cryptographic hash function basics: Definitions, implications, and separations for preimage resistance, second preimage resistance, and collision r esistance. In Fast Software Encryption pp. 371 388. Springer Berlin Heidelberg, 2004. [RSA78] Rivest, Ronald L., Adi Shamir, and Len Adleman. A method for obtaining digital signatures and public key cryptosystems. Communications of the ACM 21, no. 2 (19 78): 120 126. [RSS04] Ryabko, B. Ya, V. S. Stognienko, and Yu I. Shokin. A new test for randomness and its application to some cryptographic problems. Journal of statistical planning and inference 123, no. 2 (2004): 365 376. [Sar09] Sarkar, Palash Domain extender for collision resistant hash functions: Improving upon Merkle DamgÂŒrd iteration. Discrete Applied Mathematics 157, no. 5 (2009): 1086 1097. [SBA99] Cavallar, Stefania, Bruce Dodson, Arjen Lenstra, Paul Leyland, Walter Lioen, Peter L. Mo ntgomery, Brian Murphy, Herman Te Riele, and Paul
127 Zimmermann. Factorization of RSA 140 using the number field sieve Springer Berlin Heidelberg, 1999. [Sch00] Schneier, Bruce. A self study course in block cipher cryptanalysis. Cryptologia 24, no. 1 (2000 ): 18 33. [SHA08] National Institute of Standards and Technology. Secure Hash Stan dard (SHS). Federal Information Processing Standards Publication180 3, October 2008. [Sho97] Shor Peter W. Polynomial time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM journal on computing 26, no. 5 (1997): 1484 1509. [Sil0 6 ] Silverman, Joseph H. "An introduction to the theory of elliptic curves." link:http ://www.math.brown. edu/jhs/Presentations/WyomingEllipticCur ve. pdf (2006). [Sil07] Silverman, J. The four faces of lifting for the elliptic curve discrete logarithm problem. In 11th Workshop on Elliptic Curve Cryptography, University College Dublin 2007. [Sil99] Silverman, Joseph H. The xedni calculus and the elliptic curve discrete logarithm problem. Designs, Codes and Cryptography 20, no. 1 (1999): 5 40. [Sin00] Singh, Simon. The code book: the science of secrecy from ancient Egypt to quantum cryp tography Random House Digital, Inc., 2000. [Sma97] Smart, N. Announcement of an attack on the ECDLP for anomalous elliptic curves. Hewlett Packard Laboratories, UK (1997). [SS98] Silverman, Joseph H., and Joe Suzuki. Elliptic curve discrete logarithms and the index calculus. In Advances in Cryptology ASIACRYPT'98 pp. 110 125. Springer Berlin Heidelberg, 1998. [Swe12] Swenson, Christopher. Modern Cryptanalysis: techniques for a dvanced code breaking John Wiley & Sons, 2012. [SZ01] Steinfeld, Ron, and Yuliang Zheng. On the security of RSA with primes sharing least significant bits. Applicable Algebra in Engineering, Communication and Computing 15, no. 3 4 (2004): 179 200. [USAI] Center, US Army Intelligence, and Fort Huachuca. US Army Military Intelligence History: A Sourcebook.
128 [Weg02] De Weger, Benne. Cryptanalysis of RSA with small prime difference. Applicable Algebra in Engineering, Communication and Computing 13, no. 1 (2002): 17 28. [Wil11] Williams, V. Vassilevska. Breaking the Coppersmith Winograd barrier, 2011. [WM68] Western, A. E., and J. C. P. Miller. Tables of indices and primitive roots Cambridge Univ. Press, Cambridge, 1968. [Yan12] Yan, Song Y. Computational number theory and modern cryptography John Wiley & Sons, 2012. [Yas08] Yasuda, Kan. How to fill up Merkle DamgÂŒrd hash functions. In Advances in Cryptology ASIACRYPT 2008 pp. 272 289. Springer Berlin Heidelberg, 2008. [YJ06] Yan, Song Y ., and Glyn James. Can Integer Factorization be in P?. In Computational Intelligence for Modelling, Control and Automation, 2006 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on pp. 26 6 266. IEEE, 2006. [YM08] Yanofsky, Noson S., and Mirco A. Mannucci. Quantum computing for computer scientists Vol. 20. Cambridge: Cambridge University Press, 2008. [ZZW06] Zhou, Chuanhua, Gemei Zhu, Baohua Zhao, and Wei Wei Study of one way hash function to digital signature technology. In Computational Intelligence and Security, 2006 International Conference on vol. 2, pp. 1503 1506. IEEE, 2006.
129 APPENDIX A. Frequency Analysis Running frequency analysis on English letters three times, each run is attributed with meaningful random articles with exactly 40,000 characters each run. Note how close the values of each letter in all three trials besides the large gap between the frequencies of some letters (e.g., e' vs. z' ). (! )! *! +! ,! -(! -)! -*! .! /! 0! 1! 2! 3! 4! 5! 6! 7! 8! 9! :! ;! ! ?! @! A! B! C! D! E! F! G! H?6.9!I! H?6.9!)! H?6.9!-!
130 B. Attack Graph