Cryptography: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Sandy Harris
(→‎Cryptography is difficult: have ULTRA link, delete Bletchley)
imported>Sandy Harris
(→‎Principles of cryptosystems: total rewrite of major section)
Line 28: Line 28:
'''Ciphers''' use a mathematical operation to convert understandable '''plaintext''' into unintelligible '''ciphertext'''. In general, for two-way ciphers, the number of plaintext symbols is equal to, or slightly less if error-checking is in use, than the number of ciphertext symbols.
'''Ciphers''' use a mathematical operation to convert understandable '''plaintext''' into unintelligible '''ciphertext'''. In general, for two-way ciphers, the number of plaintext symbols is equal to, or slightly less if error-checking is in use, than the number of ciphertext symbols.


==Principles of cryptosystems==
==Principles and terms ==


In encryption and decryption, a '''key''' is one or more unique values used by an '''encryption or decryption algorithm'''.  Encryption algorithms take as input a key and '''plaintext''', producing '''ciphertext''' output.  For decryption, the process is reversed to turn ciphertext into plaintext, but decryption need not use the same key(s) or algorithms for decryption as was used for encryption.
In encryption and decryption, a '''key''' is one or more unique values used by an '''encryption or decryption algorithm'''.  Encryption algorithms take as input a key and '''plaintext''', producing '''ciphertext''' output.  For decryption, the process is reversed to turn ciphertext into plaintext.


The ciphertext produced by an encryption algorithm should bear no resemblance to the original message. Ideally, it should be indistinguishable from a random string of symbols.
The system should be secure against an attacker who knows all its details except the key; this is known as [[Kerckhoffs' Principle]].


===One-way encryption===
Methods of defeating cryptosystems have a long history and an extensive literature; see [[cryptanalysis]]. Anyone designing or deploying a cyptosystem must take cryptanalytic results into account.
There are a substantial number of applications where it is not necessary to be able to reconstruct the plaintext from the ciphertext, but merely to be able to prove that some piece of information could be generated only from the original plaintext. See [[one-way encryption]] for the techniques; some applications are presented here.
====Protection against record modification====
====Protection against file modification====
====Protecting stored passwords====
When passwords are stored on a computer, it is essential that they be kept secret.  Thus it is recommended practice to encrypt the passwords before writing them to disk, and furthermore to prevent anyone who might find them from decrypting them.  ''One-way'' encryption involves storing an encrypted string which cannot be decrypted.  When a user later enters their password, the newly enter password is first encrypted, and then is compared to the encrypted stored string.


The password is usually encrypted as a [[hash digest]] (a large number generated by scrambling and condensing plain text letters). An example of a hash digest is SHA-1, which dates from 1994.  The SHA-1 algorithm takes a string as input.  The algorithm is a digest because the result is a fixed-size number. The SHA-1 algorithm always outputs a 160-bit number (20 bytes of storage).  48 decimal digits would be required to express this number, and it is usually displayed to humans as a 28-character, base-64 encoded string. Here are some examples:
The ciphertext produced by an encryption algorithm should bear no resemblance to the original message. Ideally, it should be indistinguishable from a random string of symbols. Any non-random properties may provide an opening for a skilled cryptanalyst.


  Hello World  z7R8yBtZz0+eqead7UEYzPvVFjw=
=== Keying ===
VB            L1SHP0uzuGbMUpT4z0zlAdEzfPE=
 
vb            eOcnhoZRmuoC/Ed5iRrW7IxlCDw=
Even an excellent safe cannot protect against a thief who knows the combination. Even an excellent cipher cannot protect against an enemy who knows the key.
  Vb            e3PaiF6tMmhPGUfGg1nrfdV3I+I=
 
vB            gzt6my3YIrzJiTiucvqBTgM6LtM=
Many cryptographic techniques — [[block cipher]]s, [[stream cipher]]s, [[public key]] encryption, [[digital signature]]s, and [[hashed message authentication code]]s — depend on [[cryptographic key]]s. '''None of these can be secure if the key is not.''' Enemies can sometimes read encrypted messages without breaking the cipher; they use [[Cryptanalysis#Practical_cryptanalysis | practical cryptanalysis]] techniques such as breaking into an office to steal keys.
 
The ''quality'' of the keys is almost as important as their secrecy. '''Keys need to be highly random''', effectively impossible to guess. See [[random number]] for details. A key that an enemy can easily guess, or that he can find with a low-cost search, does not provide much protection. Using strong cryptography with a poor key is like buying good locks then leaving the key under the doormat.
 
In applications which encrypt a large volume of data, '''any cipher must be re-keyed from time to time''' to prevent an enemy from accumulating large amounts of data encrypted with a single key. Such a collection facilitates some attacks — see [[code book attack]], [[linear cryptanalysis]] and [[differential cryptanalysis]] in particular, and [[cryptanalysis]] in general. ''It also makes the payoff for breaking that key very large''. Re-keying also limits the damage if a key is compromised in some other way. Neither block ciphers nor stream ciphers typically include a re-keying mechanism; some higher-level protocol manages that and re-keys the cipher using the normal keying mechanism.
 
In some applications, there are natural breaks where a new key should be used. For example it is natural to use a different key for each new message in a message-oriented protocol such as email, or for each new connection in a connection-oriented protocol such as [[SSH]]. This may be all the re-keying required. Or it may not; what if some users send multi-gigabyte emails or stay logged in for months?
 
In other applications, a mechanism for periodic re-keying is required. For a [[VPN]] connection between two offices, this would normally be the [[Internet Key Exchange]] protocol. For an embassy, it might be a clerk who changes the key daily and an officer who delivers more keys once a month, flying in with a briefcase handcuffed to his wrist.
 
There are many ways to manage keys, ranging from physical devices and [[smartcard]]s to cryptographic techniques such as [[Diffie-Hellman]]. In some cases, an entire [[public key infrastructure]] may be involved. See [[key management]] for details.
 
=== External attacks ===
 
Any of the techniques of [[espionage]] — bribery, coercion, blackmail, deception ... — may be used to obtain keys; such methods are called [[#cryptanalysis#practical cryptanalysis | practical cryptanalysis]]. In general, these methods work against the people and organisations involved, looking for human weaknesses or poor security procedures. They are beyond our scope here; see [[information security]].
 
For computer-based security systems, host security is a critical prerequisite. '''No system can be secure if the underlying computer is not.''' Even systems generally thought to be secure, such as [[IPsec]] or [[PGP]] are ''trivially'' easy to subvert for an enemy who has already subverted the machine they run on. See [[computer security]].
 
For some systems, host security may be an impossible goal. Consider a [[Digital Rights Management]] system whose design goal is to protect content against the owner of the computer or DVD player it runs on. If that owner has full control over his device then the goal is not achievable.
 
Encrypting messages does not prevent [[traffic analysis]]; an enemy may be able to gain useful information from the timing, size, source and destination of traffic, even if he cannot read the contents.
 
=== Side channel attacks ===
 
There are also [[Cryptanalysis#Side_channel_attacks | side channel attacks]].
 
For example, any electrical device handling fast-changing signals will produce '''electromagnetic radiation'''. An enemy might listen to the radiation from a computer or from crypto hardware. For the defenders, there are standards for limiting such radiation; see [[TEMPEST]] and [[protected distribution system]].
 
'''Timing attacks''' make inferences from the length of time cryptographic operations take. These may be used against devices such as [[smartcard]]s or against systems implemented on computers. Any cryptographic primitive — block cipher, [[stream cipher]], [[public key]] or [[cryptographic hash]] — can be attacked this way. '''Power analysis''' has also been used, in much the same way as timing. The two may be combined.
 
'''Differential fault analysis''' attacks a cipher embedded in a [[smartcard]] or other device. Apply stress (heat, mechanical stress, radiation, ...) to the device until it begins to make errors; with the right stress level, most will be single-bit errors. Comparing correct and erroneous output gives the [[cryptanalysis |cryptanalyst]] a window into cipher internals. This attack is extremely powerful; "we can extract the full DES key from a sealed tamper-resistant DES encryptor by analyzing between 50 and 200 ciphertexts generated from unknown but related plaintexts" [http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi?1997/CS/CS0910].
 
See [[cryptanalysis]] for details and [[information security]] for defenses.
 
== Secret key systems ==
{{main|Symmetric key cryptography}}
Until the 1970s, all (publicly known) cryptosystems used '''secret key''' or [[symmetric key cryptography]] methods. In such a system, there is only one secret key for a message; that key can be used either to encrypt or decrypt the message. Both the sender and receiver must have the key, and third parties (potential intruders) must be prevented from obtaining the key.  Symmetric key encryption may also be called ''traditional'', ''shared-secret'', ''secret-key'', or ''conventional'' encryption.
 
Historically, [[cipher]]s worked at the level of letters; see [[history of cryptography]] for details. Attacks on them used techniques based largely on linguistic analysis, such as frequency counting; see [[cryptanalysis]]. <!-- This should become a more specific link when there is something better to link to -->
 
On computers, there are two main types of symmetric encryption algorithm:
 
A [[block cipher]] breaks the data up into fixed-size blocks and encrypt each block under control of the key. Since the message length will rarely be an integer number of blocks, there will usually need to be some form of "padding" to make the final block long enough. The block cipher itself defines how a single block is encrypted; [[Block cipher modes of operation | modes of operation]] specify how these operations are combined to achieve some larger goal.
 
A [[stream cipher]] encrypts a stream of input data by combining it with a [[random number | pseudo-random]] stream of data; the pseudo-random stream is generated under control of the encryption key.
 
Another method, usable manually or on a computer, is a [[one-time pad]]. This works much like a stream cipher, but it does not need to generate a pseudo-random stream because its key is a ''truly random stream as long as the message''. This is the only known cipher which is provably secure (provided the key is truly random and no part of it is ever re-used), but it is impractical for most applications because managing such keys is too difficult.
 
=== Key management ===
 
More generally, [[key management]] is a problem for any secret key system.
* It is ''critically'' important to '''protect keys''' from unauthorised access; if an enemy obtains the key, then he or she can read all messages ever sent with that key.
* It is necessary to '''change keys''' periodically, both to limit the damage if an attacker does get a key and to prevent various [[cryptanalysis|attacks]] which become possible if the enemy can collect a large sample of data encrypted with a single key.
* It is necessary to '''communicate keys'''; without a copy of the identical key, the intended receiver cannot decrypt the message.
Managing all of these simultaneously is an inherently difficult problem.


===Two-way encryption===
One problem is where, and how, to safely store the key. In a manual system, you need a key that is long and hard to guess because keys that are short or guessable provide little security. However, such keys are hard to remember and if the user writes them down, then you have to worry about someone looking over his shoulder, or breaking in and copying the key, or the writing making an impression on the next page of a pad, and so on.
Some approaches are said to be ''two-way'' because text messages are both encrypted ''and'' decrypted.  One approach is called ''secret key'' or [[symmetric key cryptography]], in which the sender encrypts with a secret key and the receiver must have an identical copy of that key to decrypt. The second method is called ''public key'' or [[asymmetric key cryptography]]; in this, keys are created in pairs, such that when one is used to encrypt, the other must be used to decrypt. One key is the ''private key''; this is kept secret from everyone. The other is the ''public key''; this can be published anywhere; on the net, in the phonebook, on business cards.  


Practical use of asymmetric cryptography, on any sizable basis, requires a [[public key infrastructure]] (PKI). A public key will normally be embedded in a [[digital certificate]] that is issued by a [[certification authority]]. In the event of compromise of the private key, the certification authority can revoke the key by adding it to a [[certificate revocation list]]. Digital certificates, like passports or other identification documents, usually have expiration dates, and a means of verifying both the validity of the certificate and of the certificate issuer.
On a computer, keys must be protected so that enemies cannot obtain them. Simply storing the key unencrypted in a file or database is a poor strategy. A better method is to encrypt the key and store it in a file that is protected by the file system; this way, only authorized users of the system should be able to read the file. But then, where should one store the key used to encrypt the secret key?  It becomes a recursive problem. Also, what about an attacker that can defeat the file system protection? If the key is stored encrypted but you have a program that decrypts and uses it, can an attacker obtain the key via a memory dump or a debugging tool? If a network is involved, can an attacker get keys by intercepting network packets? Can an attacker put a keystroke logger on the machine; if so, he can get everything you type, possibly including keys or passwords.


''Public key'' exchanges are used to open up secure ''secret key'' channels between strangers across the internet.
Communicating keys is an even harder problem. With secret key encryption alone, it would not be possible to open up a new secure connection on the internet, because there would be no safe way initially to transmit the shared key to the other end of the connection without intruders being able to intercept it. A government or major corporation might send someone with a briefcase handcuffed to his wrist, but for many applications this is impractical.
====Units on which encryption operates====
=====Character ciphers=====
=====Block ciphers=====
These break the data up into fixed-size blocks and encrypt each block under control of the key. Since the message will rarely have the same length of an integer number of blocks, there will usually need to be some form of "paddding" to make the final block be long enough. The padding, if not pseudorandom, may provide a probable plaintext attack point, and certainly adds overhead to the encryption process.


=====Stream ciphers=====
Moreover, the problem grows quadratically if there are many users. If <math>N</math> users must all be able to communicate with each other securely, then there are <math>N(N-1)/2</math> possible connections, each of which needs its own key. For large <math>N</math> this becomes quite unmanageable.
These encrypt a stream of input data by combining it with a pseudo-random stream of data; the pseudo-random stream is generated under control of the encryption key. They are statistically stronger as there is no block size to use as one controlled variable in cryptanalysis.


====Paradigms of encryption====
Various techniques can be used to address the difficulty. A centralised server, such as the [[Kerberos]] system developed at MIT [http://web.mit.edu/Kerberos/] and used (not without controversy [http://slashdot.org/article.pl?sid=00/05/02/158204]) by all versions of [[Microsoft Windows]] since [[Windows 2000]] [http://technet.microsoft.com/en-us/library/bb742431.aspx] is one method. Other techniques use ''two factor authentication'', combining "something you have" (e.g. your ATM card) with "something you know" (e.g. the PIN).
=====Symmetric key encryption=====
[[Symmetric key cryptography]] requires only one key.  That key can be used either to encrypt or decrypt a message.  Both the sender and receiver must have the key, and third parties (potential intruders) must be prevented from obtaining the key. Symmetric key encryption may also be called ''traditional'', ''shared-secret'', ''secret-key'', or ''conventional'' encryption.  This was the only kind of two-way encryption publicly known until 1976.<ref name=dh2>{{citation
| first1 = Whitfield | last1 = Diffie | first2 = Martin | last2 = Hellman
| title = New Directions in Cryptography
| journal = IEEE Transactions on Information Theory
| volume = IT-22
| url = http://citeseer.ist.psu.edu/rd/86197922%2C340126%2C1%2C0.25%2CDownload/http://citeseer.ist.psu.edu/cache/papers/cs/16749/http:zSzzSzwww.cs.rutgers.eduzSz%7EtdnguyenzSzclasseszSzcs671zSzpresentationszSzArvind-NEWDIRS.pdf/diffie76new.pdf pdf
| date = Nov. 1976}} pages = 644-654</ref>  With this kind of encryption alone, it would not be possible to open up a new secure connection on the internet, because there would be no safe way initially to transmit the shared key to the other end of the connection without intruders being able to intercept it.


One example of a secret-key algorithm is the [[Data Encryption Standard]] (DES). DES uses a 56-bit key which is used to derive 16 48-bit round keys. Neither DES nor the variant, triple DES, are secure against modern attacks (see [[cryptographic strength]] for the reasons).  
The development of [[#public key | public key]] techniques, describe in the next section, allows simpler solutions.


One practical consideration for using secret key encryption is where, and how, to safely store the key on the computer so that intruders will not be able to obtain it, if they should gain access to the machine.  Simply storing the key unencrypted in a file or database is a poor strategy.  Hard coding it inside a program is also risky because it could be fished out using a decompiler or debugger.  A preferable strategy would be to encrypt the private key and store it in a file that is protected by the file system; this way, only authorized users of the system should be able to read the file.  But then, where should one store the key used to encrypt the private key?  It becomes a recursive problem.
== Public key systems==
=====Asymmetric key encryption=====
{{main|asymmetric key cryptography}}
In contrast with symmetric encryption, '''asymmetric''' encryption, a user has their computer produce two different keys, related mathematically so that data encrypted with one can only be decrypted with the other. One key is the '''public key''' and can be published. The other is the private key and is kept secret, never leaving the user's computer. It was first proposed, in public, in 1976 by  Whitfield Diffie and Martin Hellman.<ref>{{citation
'''Public key''' or [[asymmetric key cryptography]] was first proposed, in the open literature, in 1976 by  Whitfield Diffie and Martin Hellman.<ref>{{citation
  | first1 = Whitfield | last1 = Diffie | first2=Martin | lastt2 = Hellman
  | first1 = Whitfield | last1 = Diffie | first2=Martin | lastt2 = Hellman
  | title = Multi-user cryptographic techniques
  | title = Multi-user cryptographic techniques
Line 83: Line 118:
  | volume = 5
  | volume = 5
  | pages = 109-112
  | pages = 109-112
  | date = June 8, 1976}}</ref>.   
  | date = June 8, 1976}}</ref>.  The  historian David Kahn described it as "the most revolutionary new concept in the field since polyalphabetic substitution emerged in the Renaissance" <ref>David Kahn, "Cryptology Goes Public", 58 ''Foreign Affairs]'' 141, 151 (fall 1979), p. 153</ref>. There are two reasons public key cryptography is so important. One is that it solves the key management problem described in the preceding section; the other is that public key techniques are the basis for [[#digital signature | digital signatures]].
 
In a public key system, keys are created in matched pairs, such that when one of a pair is used to encrypt, the other must be used to decrypt. The system is designed so that calculation of one key from knowledge of the other is computationally infeasible, even though they are necessarily related. Keys are generated secretly, in interrelated pairs. One key from a pair becomes the '''public key''' and can be published. The other is the '''private key''' and is kept secret, never leaving the user's computer.
 
In many applications, public keys are widely published &mdash; on the net, in the phonebook, on business cards, on key server computers which provide an index of public keys. However, it is also possible to use public key technology while restricting access to public keys; some military systems do this, for example. The point of public keys is not that they must be made public, but that they ''could'' be; the security of the system does not depend on keeping them secret.
 
One big payoff is that two users (traditionally, A and B or [[Alice and Bob]]) need not share a secret key in order to communicate securely. When used for [[communications security#content confidentiality|content confidentiality]], the ''public key'' is typically used for encryption, while the ''private key'' is used for decryption. If Alice has (a trustworthy, verified copy of) Bob's public key, then she can encrypt with that and know that only Bob can read the message since only he has the matching private key. He can reply securely using her public key. '''This solves the key management problem'''. The difficult question of how to communicate secret keys securely does not need to even be asked; the private keys are never communicated and there is no requirement that communication of public keys be done securely.


A [[public key cryptography]] system is constructed so that calculation of the private key is computationally infeasible from knowledge of the public key, even though they are necessarily related. Instead, both keys are generated secretly, as an interrelated pair<ref>Ralph Merkle was working on similar ideas at the time, and Hellman has suggested that the term used should be Diffie-Hellman-Merkle asymmetric key cryptography.</ref>. The  historian David Kahn described public-key cryptography as "the most revolutionary new concept in the field since polyalphabetic substitution emerged in the Renaissance".<ref>David Kahn, "Cryptology Goes Public", 58 ''Foreign Affairs]'' 141, 151 (fall 1979), p. 153.</ref>
Moreover, key management on a single system becomes much easier. In a system based on secret keys, if Alice communicates with <math>N</math> people, her system must manage <math>N</math> secret keys all of which change periodically, all of which must sometimes be communicated, and each of which must be kept secret from everyone except the one person it is used with. For a public key system, the main concern is managing her own private key; that generally need not change and it is never communicated to anyone.


When used for [[communications security#content confidentiality|content confidentiality]], the ''public key'' is typically used for encryption, while the ''private'' or ''secret key'' is used for decryption. There needs to be a separate public-private pair in each direction of communication. The early asymmetric techniques were vulnerable to some forms of [[cryptanalysis]], until Diffie and Hellman showed that public-key cryptography was practucak by presenting the [[Diffie-Hellman]] key exchange protocol<ref name=dh2 />. In 1978, Ronald Rivest, Adi Shamir, and Len Adleman invented [[Rivest-Shamir-Adleman]] ([[RSA]]), another public-key system<ref name=RSA>{{citation
Of course, she must also manage the public keys for her correspondents. In some ways, this is easier; they are already public and need not be kept secret. However, it is absolutely necessary to authenticate each public key. Consider a philandering husband sending passionate messages to his mistress. If the wife creates a public key in the mistress' name and he does not check the key's origins before using it to encrypt messages, he may get himself in deep trouble.
 
Public-key encryption is slower than conventional symmetric encryption so it is common to use public key algorithm for key management but a faster symmetric algorithm for the main data encryption. Such systems are described in more detail below; see [[#hybrid cryptosystems | hybrid cryptosystems]].
 
The other big payoff is that, given a public key cryptosystem, [[#digital signature | digital signatures]] are a straightforward application. The basic principle is that if Alice uses her private key to encrypt some known data then anyone can decrypt with her public key and, if they get the right data, they know (assuming the system is secure and her private key unknown to others) that it was her who did the encryption. In effect, she can use her private key to sign a document. The details are somewhat more complex and are dealt with in a [[#digital signature | later section]].
 
Many different asymmetric techniques have been proposed and some have been shown to be vulnerable to some forms of [[cryptanalysis]];  see the [[public key]] article for details. The most widely used public techniques today are the [[Diffie-Hellman]] key agreement protocol<ref name=dh2 /> and the [[RSA]] ([[Rivest-Shamir-Adleman]]) public-key system<ref name=RSA>{{citation
  | first1 = Ronald L. | last1 = Rivest | first2 = Adi |last2= Shamir | first3 = Len | last3 = Adleman   
  | first1 = Ronald L. | last1 = Rivest | first2 = Adi |last2= Shamir | first3 = Len | last3 = Adleman   
  | url = http://theory.lcs.mit.edu/~rivest/rsapaper.pdf  
  | url = http://theory.lcs.mit.edu/~rivest/rsapaper.pdf  
  | title = A Method for Obtaining Digital Signatures and Public-Key Cryptosystems}}</ref> <ref>Communications of the ACM, Vol. 21 (2), pp.120&ndash;126. 1978</ref> Previously released as an MIT "Technical Memo" in April 1977, and published in Martin Gardner's ''Scientific American'' "Mathematical Recreations" column</ref>. In 1997, it finally became publicly known that asymmetric cryptography had been invented by James H. Ellis at [[GCHQ]], a [[United Kingdom|British]] intelligence organization, in the early 1970s, and that both the Diffie-Hellman and RSA algorithms had been previously developed (by Malcolm J. Williamson and Clifford Cocks, respectively)<ref>[http://www.cesg.gov.uk/publications/media/nsecret/notense.pdf Clifford Cocks. A Note on 'Non-Secret Encryption', CESG Research Report, 20 November 1973].</ref>.
  | title = A Method for Obtaining Digital Signatures and Public-Key Cryptosystems}}</ref>. Techniques based on [[elliptic curve]]s are also used.
 
======Generating session keys======
In 1997, it finally became publicly known that asymmetric cryptography had been invented by James H. Ellis at [[GCHQ]], a [[United Kingdom|British]] intelligence organization, in the early 1970s, and that both the Diffie-Hellman and RSA algorithms had been previously developed (by Malcolm J. Williamson and Clifford Cocks, respectively)<ref>[http://www.cesg.gov.uk/publications/media/nsecret/notense.pdf Clifford Cocks. A Note on 'Non-Secret Encryption', CESG Research Report, 20 November 1973].</ref>.
The primary usage of public-key encryption is in hybrid systems where a symmetric algorithm does the bulk data encryption while the public key algorithm provides other services. Public-key encryption is slower than conventional symmetric encryption.For example, in [[Pretty Good Privacy]] ([[PGP]]) email encryption the sender generates a random key for the symmetric bulk encryption and uses public key techniques to securely deliver it to the receiver. In the [[Diffie-Hellman]] key agreement protocol, used in [[IPsec]] and other systems, public key techniques provide authentication.
 
== Cryptographic hash algorithms ==
{{main| Hash (cryptography)}}
 
'''Hashing''' or '''message digest''' algorithms take an input of arbitrary size and produce a fixed-size digest, a sort of fingerprint of the input document. Some of the techniques are the same as those used in other cryptography but the goal is quite different. Where ciphers (whether symmetric or asymmetric) provide secrecy, hashes provide authentication.
 
Using a hash for [[information security#integrity|data integrity protection]] is straightforward. If Alice hashes the text of a message and appends the hash to the message when she sends it to Bob, then Bob can verify that he got the correct message. He computes a hash from the received message text and compares that to the hash Alice sent. If they compare equal, then he knows (with overwhelming probability, though not with absolute certainty) that the message was received exactly as Alice sent it. Exactly the same method works to ensure that a document extracted from an archive, or a file downloaded from a software distribution site, is as it should be.
 
However, the simple technique above is useless against an adversary who intentionally changes the data. The enemy simply calculates a new hash for his changed version and stores or transmits that instead of the original hash. '''To resist an adversary takes a keyed hash''', a [[hashed message authentication code]] or HMAC. Sender and receiver share a secret key; the sender hashes using both the key and the document data, and the receiver verifies using both. Lacking the key, the enemy cannot alter the document undetected.
 
If Alice uses an HMAC and that verfies correctly, then Bob knows ''both'' that the received data is correct ''and'' that whoever sent it knew the secret key. If the rest of the system is secure, then only Alice knows that key, so he knows Alice was the sender. An HMAC provides [[information security#source authentication|source authentication]] as well as data authentication.
 
==Digital signatures==
{{main| Digital signature}}
 
Two cryptographic techniques are used together to produce a '''digital signature''', a [[#Cryptographic hash algorithms | hash]] and a [[#Public key systems | public key]] system.
 
Alice calculates a hash from the message, encrypt that hash with her private key and appends the encrypted hash to the message as a signature. To verify the signature, Bob needs a trustworthy copy of Alice's public key. He uses that to decrypt the signature; this should give him the hash Alice calculated. He then hashes the received message body himself to get another hash value and compares the two hashes.
 
If the two hash values are identical, then Bob knows with overwhelming probability that the document Alice signed and the document he received are identical. He also knows that whoever generated the signature had Alice's private key. If both the hash and the public key system used are secure, and no-one except the sender knows his private key, then the signatures are trustworthy.
 
A digital signature has some of the desirable properties of an ordinary [[signature]]. It is easy for a user to produce, but difficult for anyone else to [[forgery|forge]]. The signature is  permanently tied to the content of the message being signed; it cannot be copied from one document to another, or used with an altered document, since the different document would give a different hash.
 
Any public key technique can provide digital signatures. [[RSA]] is widely used, as is the US government standard [[Digital Signature Algorithm]] (DSA).
 
Once you have digital signatures, a whole range of other applications can be built using them. Many software distributions are signed by the developers; users can check the signatures before installing. Some operating systems will not load a driver unles it has the right signature. On [[Usenet]], things like new group commands and [[NoCeM]]s [http://www.xs4all.nl/~rosalind/nocemreg/nocemreg.html] carry a signature. The digital equivalent of having a document notarised is to get a trusted party to sign a combination document &mdash; the original document, plus at least some identifying information for the notary and a time stamp.
 
[[Digital certificate]]s are the digital analog of an identification document such as a driver's license, passport, or business license. Like those documents, they usually have expiration dates, and a means of verifying both the validity of the certificate and of the certificate issuer. Like those documents, they can sometimes be revoked.
 
Practical use of asymmetric cryptography, on any sizable basis, requires a [[public key infrastructure]] (PKI). In typical PKI's, public key are embedded in [[digital certificate]]s issued by a [[certification authority]]. In the event of compromise of the private key, the certification authority can revoke the key by adding it to a [[certificate revocation list]]. There is often a hierarchy of certificates, for example a school's certificate might be issued by a local school board which certified by the state education department, that by the national education office, and that by the national government master key.
 
An alternative non-hierarchical [[web of trust]] model is used in [[PGP]]. Any key can sign any other; digital certificates are not required. Alice might accept the school's key as valid because her friend Bob is a parent there and has signed the school's key. Or because the principal gave her a business card with his key on it and he has signed the school key. Or both. Or some other combination; Charles has signed Diana's key and she signed the school's. It becomes fairly tricky to decide whether that last one justifies accepting the school key, however.
 
== Hybrid cryptosystems ==
{{main| Hybrid cryptosystem}}
 
Most real applications combine several of the above techniques into a [[hybrid cryptosystem]]. Public-key encryption is slower than conventional symmetric encryption, so use a symmetric algorithm for the bulk data encryption. On the other hand, public key techniques handle the key management problem well, and that is difficult with symmetric encryption alone, so use public key methods for that. Neither symmetric nor public key methods are deal for data authentication; use a hash for that. Many of the protocols also need cryptographic quality [[random number]]s.
 
Examples abound, each using a somewhat different combination of methods to meet its particular application requirements.
 
In [[Pretty Good Privacy]] ([[PGP]]) email encryption the sender generates a random key for the symmetric bulk encryption and uses public key techniques to securely deliver it to the receiver. Hashes are used in generating digital signatures.
 
In [[IPsec]] (Internet Protocol Security) public key techniques provide [[information security#source authentication|source authentication]] for the gateway computers which manage the tunnel. Keys are set up using the [[Diffie-Hellman]] key agreement protocol and the actual data packets are (generally) encrypted with a [[block cipher]] and authenticated with an [[HMAC]].
 
In [[Secure Sockets Layer]] (SSL) or the later version [[Transport Layer Security]] (TLS) which provides secure web browsing (http'''s'''), digital certificates are used for [[information security#source authentication|source authentication]] and connections are generally encrypted with a [[stream cipher]].
 
== One-way encryption ==
{{main|One-way encryption}}
There are applications where it is not necessary to be able to reconstruct the plaintext from the ciphertext, but merely to be able to prove that some piece of information could be generated only from the original plaintext. In some cases, it is undesirable for anyone to be able to reverse the process.
 
A typical example is storing passwords on a computer; they must be kept secret, ideally they would remain secret even if the system administrator was dishonest or if an intruder gained administrator privileges. Thus it is standard practice to encrypt the passwords before writing them to disk, and furthermore to choose an encryption method that does not have a matching decryption so that an intruder or a rogue administrator cannot decrypt the stored forms and obtain passwords. This accomplishes the goal of passwords, providing [[authentication]] of users. When a user enters their password, it can be encrypted, and then compared to the stored encrypted password; if they match, the user got the password right.
 
Early [[Unix]] systems used [[DES]] but used the password as key rather than as plaintext so the algorithm was not reversible. In principle, any [[block cipher]] could be used in a similar way. Modern systems generally use a [[#Cryptographic hash algorithms| hash]] algorithm, which gives a fixed-size digest. Using [[SHA-1]], for example, gives a 160-bit digest (20 bytes of storage). It is usually stored in a human-readable form, a 28-character, base-64 encoded string.  Here are some examples:
 
Hello World  z7R8yBtZz0+eqead7UEYzPvVFjw=
VB            L1SHP0uzuGbMUpT4z0zlAdEzfPE=
vb            eOcnhoZRmuoC/Ed5iRrW7IxlCDw=
Vb            e3PaiF6tMmhPGUfGg1nrfdV3I+I=
vB            gzt6my3YIrzJiTiucvqBTgM6LtM=
 
Password systems generally also include some ''salt'', extra data added to the password before encryption or hashing. This helps prevent [[dictionary attack]]s. A enemy cannot simply encrypt very word in the dictionary and then search for matches in the password file. If there are 12 bits of salt, then each dictionary word will have 4096 possible matches for different salt values; this makes the attack harder. A side effect is that if a user uses the same password on multiple systems, they will encrypt differently because each system uses different salt. Of course users should still not use dictionary words as passwords or re-use the same password on different systems.


An advantage of asymmetric over symmetric cryptosystems is that all symmetric system keys must be kept secret, and the logistics of [[key management]] become complex.  Each distinct pair of communicating parties must share a different key. The number of keys required increases as the square of the number of network members, which requires very complex key management schemes in large networks.  The difficulty of establishing a secret key between two communicating parties when a [[secure channel]] doesn't already exist between them also presents a chicken-and-egg problem which is a considerably practical obstacle for cryptography users in the real world.
== Steganography ==
{{main|Steganography}}


======Digital signatures======
[[Steganography]] is the study of techniques for hiding a secret message within an apparently innocent message. For example, given an image with 1 M pixel and 3 bytes for different colours in each pixel, one could hide 3 Mbits of message in the least significant bits of each byte, with reasonable hope that the change to the image would be unnoticeable.
In addition to encryption, public-key cryptography can be used to implement [[digital signature]] schemes.  A digital signature is somewhat like an ordinary [[signature]]; they have the characteristic that they are easy for a user to produce, but difficult for anyone else to [[forgery|forge]]. Digital signatures can also be permanently tied to the content of the message being signed; they cannot be 'moved' from one document to another, for any attempt will be detectable. In digital signature schemes, there are two algorithms: one for ''signing'', in which a secret key is used to process the message (or a hash of the message or both), and one for ''verification,'' in which the matching public key is used with the message to check the validity of the signature.  [[RSA]] and [[Digital Signature Algorithm|DSA]] are two of the most popular digital signature schemes. Digital signatures are central to the operation of [[public key infrastructure]]s and to many network security schemes ([[Transport Layer Security]] (TLS),<ref name=RFC>{{citation
| id = RFC5246
| title = The Transport Layer Security (TLS) Protocol Version 1.2.
| author = T. Dierks, E. Rescorla
| date = August 2008
| url = http://www.ietf.org/rfc/rfc5246.txt}}</ref> also [[Secure Sockets Layer]] (SSL), many [[VPN]]s, etc).<ref name="schneierbook">{{citation
| first = Bruce | last = Schneier
| title = Applied Cryptography
| date = 2nd edition, 1996,
| publisher = John Wiley & Sons
|ISBN =0-471-11709-9}}</ref>


== Cryptography is difficult ==
== Cryptography is difficult ==

Revision as of 05:24, 16 December 2008

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Template:TOC-right Cryptography is a branch of mathematics concerned with obscuring information and then controlling who can retrieve the information. While this article will focus on modern applications and technique, cryptography has been in use for three millenia; when literacy was rare, writing could be considered a means of obscurement. See History of cryptography.

Cryptography has found many practical applications related to security and reliability in computers, especially when connected to networks. Cryptography is central to digital rights management (DRM) in computers and is important in the fields of computer science and engineering, as well as mathematics. By encrypting a message, we can be fairly certain who sent it, fairly certain that others can’t read it, and we can determine that the message likely has not been altered. If someone scrambles a message to make it illegible, the original message is called plaintext, and the scrambled message is called ciphertext. Encryption is the act of turning plain text into cipher text. Decryption is the act of turning cipher text back to plain text

Cryptology ("the study of secrets", from the Greek) describes the field of cryptography as a whole, while cryptography proper is the subfield of encryption and decryption. Cryptanalysis or "codebreaking" refers to the study of how to break into an encrypted message without possession of the key.

As well as being aware of cryptographic history, cryptographic algorithm and system designers must also carefully consider probable future developments in their designs. For instance, the continued improvements in computer processing power in increasing the scope of brute-force attacks must be taken into account when specifying key lengths, and the potential effects of quantum computing are already being considered by good cryptographic system designers.[1]

Essentially, prior to the early 20th century, cryptography was chiefly concerned with linguistic patterns. Since then the emphasis has shifted, and cryptography now makes extensive use of mathematics, including aspects of information theory, computational complexity, statistics, combinatorics, abstract algebra, and number theory. Cryptography is also a branch of engineering, but an unusual one as it deals with active, intelligent, and malevolent opposition (see cryptographic engineering and security engineering). There is also active research examining the relationship between cryptographic problems and quantum physics (see quantum cryptography and quantum computing).

Codes versus ciphers

In common usage, the term "code" is often used to mean any method of encryption or meaning-concealment. In cryptography, however, code is more specific, meaning a linguistic procedure which replaces a unit of plain text with a code word (for example, apple pie replaces attack at dawn). Ciphers operate at a lower level, ignoring meaning and just manipulating the letters, bytes or bits.

Codes are not generally practical for lengthy or complex communications, and are difficult to do in software, as they are as much linguistic as mathematical problems. If the only times the messages need to name are dawn, noon, dusk and midnight, then a code is fine; usable code words might be John, George, Paul and Ringo. However, if messages must be able to specify things like 11:37 AM, a code is inconvenient. Also if a code is used many times, an enemy is quite likely to work out that "John" means "dawn" or whatever; there is no long-term security. Finally, changing a code can be difficult; it requires retraining users or creating and (securely!) delivering new code books. For these reasons, ciphers are generally preferred in practice.

Nevertheless, there are niches where codes are quite useful. A small number of codes can represent a set of operations known to sender and receiver, and if they are not re-used there is no information to help a cryptanalyst. "Climb Mount Niikata" was a final order for the Japanese mobile striking fleet to attack Pearl Harbor, while "visit Aunt Shirley" could order a terrorist to trigger a chemical weapon at a particular place.

Codes may also be combined with ciphers. Then if an enemy breaks a cipher, much of what he gets will be code words. Unless he either already knows the code words or has enough broken messages to search for codeword re-use, the code may defeat him even if the cipher did not. For example, if the Americans had intercepted and decrypted a message saying "Climb Mount Niikata" just before Pearl Harbor, they would likely not have known its meaning.

A cipher (or cypher) is a system of algorithms for encryption and decryption. The exact operation of a cipher is controlled by a key, which is a secret parameter for the cipher algorithm.

Ciphers use a mathematical operation to convert understandable plaintext into unintelligible ciphertext. In general, for two-way ciphers, the number of plaintext symbols is equal to, or slightly less if error-checking is in use, than the number of ciphertext symbols.

Principles and terms

In encryption and decryption, a key is one or more unique values used by an encryption or decryption algorithm. Encryption algorithms take as input a key and plaintext, producing ciphertext output. For decryption, the process is reversed to turn ciphertext into plaintext.

The system should be secure against an attacker who knows all its details except the key; this is known as Kerckhoffs' Principle.

Methods of defeating cryptosystems have a long history and an extensive literature; see cryptanalysis. Anyone designing or deploying a cyptosystem must take cryptanalytic results into account.

The ciphertext produced by an encryption algorithm should bear no resemblance to the original message. Ideally, it should be indistinguishable from a random string of symbols. Any non-random properties may provide an opening for a skilled cryptanalyst.

Keying

Even an excellent safe cannot protect against a thief who knows the combination. Even an excellent cipher cannot protect against an enemy who knows the key.

Many cryptographic techniques — block ciphers, stream ciphers, public key encryption, digital signatures, and hashed message authentication codes — depend on cryptographic keys. None of these can be secure if the key is not. Enemies can sometimes read encrypted messages without breaking the cipher; they use practical cryptanalysis techniques such as breaking into an office to steal keys.

The quality of the keys is almost as important as their secrecy. Keys need to be highly random, effectively impossible to guess. See random number for details. A key that an enemy can easily guess, or that he can find with a low-cost search, does not provide much protection. Using strong cryptography with a poor key is like buying good locks then leaving the key under the doormat.

In applications which encrypt a large volume of data, any cipher must be re-keyed from time to time to prevent an enemy from accumulating large amounts of data encrypted with a single key. Such a collection facilitates some attacks — see code book attack, linear cryptanalysis and differential cryptanalysis in particular, and cryptanalysis in general. It also makes the payoff for breaking that key very large. Re-keying also limits the damage if a key is compromised in some other way. Neither block ciphers nor stream ciphers typically include a re-keying mechanism; some higher-level protocol manages that and re-keys the cipher using the normal keying mechanism.

In some applications, there are natural breaks where a new key should be used. For example it is natural to use a different key for each new message in a message-oriented protocol such as email, or for each new connection in a connection-oriented protocol such as SSH. This may be all the re-keying required. Or it may not; what if some users send multi-gigabyte emails or stay logged in for months?

In other applications, a mechanism for periodic re-keying is required. For a VPN connection between two offices, this would normally be the Internet Key Exchange protocol. For an embassy, it might be a clerk who changes the key daily and an officer who delivers more keys once a month, flying in with a briefcase handcuffed to his wrist.

There are many ways to manage keys, ranging from physical devices and smartcards to cryptographic techniques such as Diffie-Hellman. In some cases, an entire public key infrastructure may be involved. See key management for details.

External attacks

Any of the techniques of espionage — bribery, coercion, blackmail, deception ... — may be used to obtain keys; such methods are called practical cryptanalysis. In general, these methods work against the people and organisations involved, looking for human weaknesses or poor security procedures. They are beyond our scope here; see information security.

For computer-based security systems, host security is a critical prerequisite. No system can be secure if the underlying computer is not. Even systems generally thought to be secure, such as IPsec or PGP are trivially easy to subvert for an enemy who has already subverted the machine they run on. See computer security.

For some systems, host security may be an impossible goal. Consider a Digital Rights Management system whose design goal is to protect content against the owner of the computer or DVD player it runs on. If that owner has full control over his device then the goal is not achievable.

Encrypting messages does not prevent traffic analysis; an enemy may be able to gain useful information from the timing, size, source and destination of traffic, even if he cannot read the contents.

Side channel attacks

There are also side channel attacks.

For example, any electrical device handling fast-changing signals will produce electromagnetic radiation. An enemy might listen to the radiation from a computer or from crypto hardware. For the defenders, there are standards for limiting such radiation; see TEMPEST and protected distribution system.

Timing attacks make inferences from the length of time cryptographic operations take. These may be used against devices such as smartcards or against systems implemented on computers. Any cryptographic primitive — block cipher, stream cipher, public key or cryptographic hash — can be attacked this way. Power analysis has also been used, in much the same way as timing. The two may be combined.

Differential fault analysis attacks a cipher embedded in a smartcard or other device. Apply stress (heat, mechanical stress, radiation, ...) to the device until it begins to make errors; with the right stress level, most will be single-bit errors. Comparing correct and erroneous output gives the cryptanalyst a window into cipher internals. This attack is extremely powerful; "we can extract the full DES key from a sealed tamper-resistant DES encryptor by analyzing between 50 and 200 ciphertexts generated from unknown but related plaintexts" [1].

See cryptanalysis for details and information security for defenses.

Secret key systems

For more information, see: Symmetric key cryptography.

Until the 1970s, all (publicly known) cryptosystems used secret key or symmetric key cryptography methods. In such a system, there is only one secret key for a message; that key can be used either to encrypt or decrypt the message. Both the sender and receiver must have the key, and third parties (potential intruders) must be prevented from obtaining the key. Symmetric key encryption may also be called traditional, shared-secret, secret-key, or conventional encryption.

Historically, ciphers worked at the level of letters; see history of cryptography for details. Attacks on them used techniques based largely on linguistic analysis, such as frequency counting; see cryptanalysis.

On computers, there are two main types of symmetric encryption algorithm:

A block cipher breaks the data up into fixed-size blocks and encrypt each block under control of the key. Since the message length will rarely be an integer number of blocks, there will usually need to be some form of "padding" to make the final block long enough. The block cipher itself defines how a single block is encrypted; modes of operation specify how these operations are combined to achieve some larger goal.

A stream cipher encrypts a stream of input data by combining it with a pseudo-random stream of data; the pseudo-random stream is generated under control of the encryption key.

Another method, usable manually or on a computer, is a one-time pad. This works much like a stream cipher, but it does not need to generate a pseudo-random stream because its key is a truly random stream as long as the message. This is the only known cipher which is provably secure (provided the key is truly random and no part of it is ever re-used), but it is impractical for most applications because managing such keys is too difficult.

Key management

More generally, key management is a problem for any secret key system.

  • It is critically important to protect keys from unauthorised access; if an enemy obtains the key, then he or she can read all messages ever sent with that key.
  • It is necessary to change keys periodically, both to limit the damage if an attacker does get a key and to prevent various attacks which become possible if the enemy can collect a large sample of data encrypted with a single key.
  • It is necessary to communicate keys; without a copy of the identical key, the intended receiver cannot decrypt the message.

Managing all of these simultaneously is an inherently difficult problem.

One problem is where, and how, to safely store the key. In a manual system, you need a key that is long and hard to guess because keys that are short or guessable provide little security. However, such keys are hard to remember and if the user writes them down, then you have to worry about someone looking over his shoulder, or breaking in and copying the key, or the writing making an impression on the next page of a pad, and so on.

On a computer, keys must be protected so that enemies cannot obtain them. Simply storing the key unencrypted in a file or database is a poor strategy. A better method is to encrypt the key and store it in a file that is protected by the file system; this way, only authorized users of the system should be able to read the file. But then, where should one store the key used to encrypt the secret key? It becomes a recursive problem. Also, what about an attacker that can defeat the file system protection? If the key is stored encrypted but you have a program that decrypts and uses it, can an attacker obtain the key via a memory dump or a debugging tool? If a network is involved, can an attacker get keys by intercepting network packets? Can an attacker put a keystroke logger on the machine; if so, he can get everything you type, possibly including keys or passwords.

Communicating keys is an even harder problem. With secret key encryption alone, it would not be possible to open up a new secure connection on the internet, because there would be no safe way initially to transmit the shared key to the other end of the connection without intruders being able to intercept it. A government or major corporation might send someone with a briefcase handcuffed to his wrist, but for many applications this is impractical.

Moreover, the problem grows quadratically if there are many users. If users must all be able to communicate with each other securely, then there are possible connections, each of which needs its own key. For large this becomes quite unmanageable.

Various techniques can be used to address the difficulty. A centralised server, such as the Kerberos system developed at MIT [2] and used (not without controversy [3]) by all versions of Microsoft Windows since Windows 2000 [4] is one method. Other techniques use two factor authentication, combining "something you have" (e.g. your ATM card) with "something you know" (e.g. the PIN).

The development of public key techniques, describe in the next section, allows simpler solutions.

Public key systems

For more information, see: asymmetric key cryptography.

Public key or asymmetric key cryptography was first proposed, in the open literature, in 1976 by Whitfield Diffie and Martin Hellman.[2]. The historian David Kahn described it as "the most revolutionary new concept in the field since polyalphabetic substitution emerged in the Renaissance" [3]. There are two reasons public key cryptography is so important. One is that it solves the key management problem described in the preceding section; the other is that public key techniques are the basis for digital signatures.

In a public key system, keys are created in matched pairs, such that when one of a pair is used to encrypt, the other must be used to decrypt. The system is designed so that calculation of one key from knowledge of the other is computationally infeasible, even though they are necessarily related. Keys are generated secretly, in interrelated pairs. One key from a pair becomes the public key and can be published. The other is the private key and is kept secret, never leaving the user's computer.

In many applications, public keys are widely published — on the net, in the phonebook, on business cards, on key server computers which provide an index of public keys. However, it is also possible to use public key technology while restricting access to public keys; some military systems do this, for example. The point of public keys is not that they must be made public, but that they could be; the security of the system does not depend on keeping them secret.

One big payoff is that two users (traditionally, A and B or Alice and Bob) need not share a secret key in order to communicate securely. When used for content confidentiality, the public key is typically used for encryption, while the private key is used for decryption. If Alice has (a trustworthy, verified copy of) Bob's public key, then she can encrypt with that and know that only Bob can read the message since only he has the matching private key. He can reply securely using her public key. This solves the key management problem. The difficult question of how to communicate secret keys securely does not need to even be asked; the private keys are never communicated and there is no requirement that communication of public keys be done securely.

Moreover, key management on a single system becomes much easier. In a system based on secret keys, if Alice communicates with people, her system must manage secret keys all of which change periodically, all of which must sometimes be communicated, and each of which must be kept secret from everyone except the one person it is used with. For a public key system, the main concern is managing her own private key; that generally need not change and it is never communicated to anyone.

Of course, she must also manage the public keys for her correspondents. In some ways, this is easier; they are already public and need not be kept secret. However, it is absolutely necessary to authenticate each public key. Consider a philandering husband sending passionate messages to his mistress. If the wife creates a public key in the mistress' name and he does not check the key's origins before using it to encrypt messages, he may get himself in deep trouble.

Public-key encryption is slower than conventional symmetric encryption so it is common to use public key algorithm for key management but a faster symmetric algorithm for the main data encryption. Such systems are described in more detail below; see hybrid cryptosystems.

The other big payoff is that, given a public key cryptosystem, digital signatures are a straightforward application. The basic principle is that if Alice uses her private key to encrypt some known data then anyone can decrypt with her public key and, if they get the right data, they know (assuming the system is secure and her private key unknown to others) that it was her who did the encryption. In effect, she can use her private key to sign a document. The details are somewhat more complex and are dealt with in a later section.

Many different asymmetric techniques have been proposed and some have been shown to be vulnerable to some forms of cryptanalysis; see the public key article for details. The most widely used public techniques today are the Diffie-Hellman key agreement protocol[4] and the RSA (Rivest-Shamir-Adleman) public-key system[5]. Techniques based on elliptic curves are also used.

In 1997, it finally became publicly known that asymmetric cryptography had been invented by James H. Ellis at GCHQ, a British intelligence organization, in the early 1970s, and that both the Diffie-Hellman and RSA algorithms had been previously developed (by Malcolm J. Williamson and Clifford Cocks, respectively)[6].

Cryptographic hash algorithms

For more information, see: Hash (cryptography).


Hashing or message digest algorithms take an input of arbitrary size and produce a fixed-size digest, a sort of fingerprint of the input document. Some of the techniques are the same as those used in other cryptography but the goal is quite different. Where ciphers (whether symmetric or asymmetric) provide secrecy, hashes provide authentication.

Using a hash for data integrity protection is straightforward. If Alice hashes the text of a message and appends the hash to the message when she sends it to Bob, then Bob can verify that he got the correct message. He computes a hash from the received message text and compares that to the hash Alice sent. If they compare equal, then he knows (with overwhelming probability, though not with absolute certainty) that the message was received exactly as Alice sent it. Exactly the same method works to ensure that a document extracted from an archive, or a file downloaded from a software distribution site, is as it should be.

However, the simple technique above is useless against an adversary who intentionally changes the data. The enemy simply calculates a new hash for his changed version and stores or transmits that instead of the original hash. To resist an adversary takes a keyed hash, a hashed message authentication code or HMAC. Sender and receiver share a secret key; the sender hashes using both the key and the document data, and the receiver verifies using both. Lacking the key, the enemy cannot alter the document undetected.

If Alice uses an HMAC and that verfies correctly, then Bob knows both that the received data is correct and that whoever sent it knew the secret key. If the rest of the system is secure, then only Alice knows that key, so he knows Alice was the sender. An HMAC provides source authentication as well as data authentication.

Digital signatures

For more information, see: Digital signature.


Two cryptographic techniques are used together to produce a digital signature, a hash and a public key system.

Alice calculates a hash from the message, encrypt that hash with her private key and appends the encrypted hash to the message as a signature. To verify the signature, Bob needs a trustworthy copy of Alice's public key. He uses that to decrypt the signature; this should give him the hash Alice calculated. He then hashes the received message body himself to get another hash value and compares the two hashes.

If the two hash values are identical, then Bob knows with overwhelming probability that the document Alice signed and the document he received are identical. He also knows that whoever generated the signature had Alice's private key. If both the hash and the public key system used are secure, and no-one except the sender knows his private key, then the signatures are trustworthy.

A digital signature has some of the desirable properties of an ordinary signature. It is easy for a user to produce, but difficult for anyone else to forge. The signature is permanently tied to the content of the message being signed; it cannot be copied from one document to another, or used with an altered document, since the different document would give a different hash.

Any public key technique can provide digital signatures. RSA is widely used, as is the US government standard Digital Signature Algorithm (DSA).

Once you have digital signatures, a whole range of other applications can be built using them. Many software distributions are signed by the developers; users can check the signatures before installing. Some operating systems will not load a driver unles it has the right signature. On Usenet, things like new group commands and NoCeMs [5] carry a signature. The digital equivalent of having a document notarised is to get a trusted party to sign a combination document — the original document, plus at least some identifying information for the notary and a time stamp.

Digital certificates are the digital analog of an identification document such as a driver's license, passport, or business license. Like those documents, they usually have expiration dates, and a means of verifying both the validity of the certificate and of the certificate issuer. Like those documents, they can sometimes be revoked.

Practical use of asymmetric cryptography, on any sizable basis, requires a public key infrastructure (PKI). In typical PKI's, public key are embedded in digital certificates issued by a certification authority. In the event of compromise of the private key, the certification authority can revoke the key by adding it to a certificate revocation list. There is often a hierarchy of certificates, for example a school's certificate might be issued by a local school board which certified by the state education department, that by the national education office, and that by the national government master key.

An alternative non-hierarchical web of trust model is used in PGP. Any key can sign any other; digital certificates are not required. Alice might accept the school's key as valid because her friend Bob is a parent there and has signed the school's key. Or because the principal gave her a business card with his key on it and he has signed the school key. Or both. Or some other combination; Charles has signed Diana's key and she signed the school's. It becomes fairly tricky to decide whether that last one justifies accepting the school key, however.

Hybrid cryptosystems

For more information, see: Hybrid cryptosystem.


Most real applications combine several of the above techniques into a hybrid cryptosystem. Public-key encryption is slower than conventional symmetric encryption, so use a symmetric algorithm for the bulk data encryption. On the other hand, public key techniques handle the key management problem well, and that is difficult with symmetric encryption alone, so use public key methods for that. Neither symmetric nor public key methods are deal for data authentication; use a hash for that. Many of the protocols also need cryptographic quality random numbers.

Examples abound, each using a somewhat different combination of methods to meet its particular application requirements.

In Pretty Good Privacy (PGP) email encryption the sender generates a random key for the symmetric bulk encryption and uses public key techniques to securely deliver it to the receiver. Hashes are used in generating digital signatures.

In IPsec (Internet Protocol Security) public key techniques provide source authentication for the gateway computers which manage the tunnel. Keys are set up using the Diffie-Hellman key agreement protocol and the actual data packets are (generally) encrypted with a block cipher and authenticated with an HMAC.

In Secure Sockets Layer (SSL) or the later version Transport Layer Security (TLS) which provides secure web browsing (https), digital certificates are used for source authentication and connections are generally encrypted with a stream cipher.

One-way encryption

For more information, see: One-way encryption.

There are applications where it is not necessary to be able to reconstruct the plaintext from the ciphertext, but merely to be able to prove that some piece of information could be generated only from the original plaintext. In some cases, it is undesirable for anyone to be able to reverse the process.

A typical example is storing passwords on a computer; they must be kept secret, ideally they would remain secret even if the system administrator was dishonest or if an intruder gained administrator privileges. Thus it is standard practice to encrypt the passwords before writing them to disk, and furthermore to choose an encryption method that does not have a matching decryption so that an intruder or a rogue administrator cannot decrypt the stored forms and obtain passwords. This accomplishes the goal of passwords, providing authentication of users. When a user enters their password, it can be encrypted, and then compared to the stored encrypted password; if they match, the user got the password right.

Early Unix systems used DES but used the password as key rather than as plaintext so the algorithm was not reversible. In principle, any block cipher could be used in a similar way. Modern systems generally use a hash algorithm, which gives a fixed-size digest. Using SHA-1, for example, gives a 160-bit digest (20 bytes of storage). It is usually stored in a human-readable form, a 28-character, base-64 encoded string. Here are some examples:

Hello World   z7R8yBtZz0+eqead7UEYzPvVFjw=
VB            L1SHP0uzuGbMUpT4z0zlAdEzfPE=
vb            eOcnhoZRmuoC/Ed5iRrW7IxlCDw=
Vb            e3PaiF6tMmhPGUfGg1nrfdV3I+I=
vB            gzt6my3YIrzJiTiucvqBTgM6LtM=

Password systems generally also include some salt, extra data added to the password before encryption or hashing. This helps prevent dictionary attacks. A enemy cannot simply encrypt very word in the dictionary and then search for matches in the password file. If there are 12 bits of salt, then each dictionary word will have 4096 possible matches for different salt values; this makes the attack harder. A side effect is that if a user uses the same password on multiple systems, they will encrypt differently because each system uses different salt. Of course users should still not use dictionary words as passwords or re-use the same password on different systems.

Steganography

For more information, see: Steganography.


Steganography is the study of techniques for hiding a secret message within an apparently innocent message. For example, given an image with 1 M pixel and 3 bytes for different colours in each pixel, one could hide 3 Mbits of message in the least significant bits of each byte, with reasonable hope that the change to the image would be unnoticeable.

Cryptography is difficult

Cryptography, and more generally information security, is difficult to do well. For one thing, it is inherently hard to design a system that resists efforts by an adversary to compromise it, considering that the opponent may be intelligent and motivated, and may have large resources. To be secure, the system must resist all attacks; to break it, the attacker need only find one effective attack.

Also, neither the user nor the system designer gets feedback on problems. If your word processor fails or your bank's web site goes down, you see the results and are quite likely to complain to the supplier. If your cryptosystem fails, you may not know. If your bank's cryptosystem fails, they may not know, and may not tell you if they do.

If a serious attacker — a criminal breaking into a bank, a government running a monitoring program, an enemy in war, or any other — breaks a cryptosystem, he will certainly not tell the victims. If the victims become aware of the break, then they will change their system. They might change to something more secure, so it is very much in the attacker's interest to keep the break secret. In a famous example, the British ULTRA project read many German codes through most of World War II, and the Germans never realised it.

This is one reason cryptographers often publish details of their designs and invite attacks. In accordance with Kerckhoffs' Principle, a cryptosysten cannot be considered secure unless it remains safe even when the attacker knows all details except the key in use. A published design that withstands analysis is a candidate for trust; an unpublished design simply is not trustworthy. Without publication and analysis, there is no basis for trust. Of course "published" has a special meaning in some situations. Someone in a major government cryptographic agency need not make a design public to have it analysed; he need only ask the cryptanalysts down the hall to have a look.

Having a design publicly broken might be a bit embarrassing for the designer, but he can console himself that he is in good company; breaks routinely happen. Even the NSA can get it wrong, as Matt Blaze demonstrated in "Protocol failure in the escrowed encryption standard" [6]. Other large organisations can too: Deutsche Telecom's Magenta cipher was broken by Schneier and others [7] within hours of being first made public at an AES candidate's conference. Nor are the experts immune; Blaze and Schneier designed a cipher called MacGuffin [8] that was broken [9] before the end of the conference they presented it at.

In any case, having a design publicly broken — even broken by (horrors!) some unknown graduate student rather than a famous expert — is far less embarrassing than having a deployed system fall to a malicious attacker. At least when both design and attacks are in public research literature, the designer can either fix any problems that are found or discard one approach and try something different.

The cryptography itself is usually the easy part. Designing a good cryptographic primitive — a block cipher, stream cipher or cryptographic hash — is indeed a tricky business, but for most applications designing new primitives is unnecessary. Good primitives are readily available; see the linked articles. The hard parts are fitting them together into systems and managing those systems to actually achieve security goals. Schneier's preface [7] to "Secrets and Lies" [8] discusses this in some detail. His summary:

If you think technology can solve your security problems, then you don't understand the problems and you don't understand the technology.

Well-known papers on the difficulties of cryptography include Anderson "Why Cryptosystems Fail" [9], Schneier "Why Cryptography Is Harder Than It Looks" [10], and Gutmann "Lessons Learned in Implementing and Deploying Crypto Software" [11]. Anderson's book "Security Enginneering" [12] provides more detailed coverage. "Why Johnny can't encrypt" [13] looks at user interface issues.

Then there is the optimism of programmers. As for databases and real-time programming, cryptography looks deceptively simple. Almost any programmer can handle the basics — implement something that handles straightforward cases — fairly easily. However, as in the other fields, anyone who tackles the hard cases without both some study of relevant theory and considerable practical experience is almost certain to get it horribly wrong. This is demonstrated far too often.

For example, many companies that implement their own crypto as part of a product end up with something that is easily broken. The programmers on these product teams are competent, but they routinely get the crypto wrong. Examples include the addition of encryption to products like Microsoft Office [14], Netscape and many others. Generally, such problems are fixed in later releases.

There are also failures in products where encryption is central to the design. Almost every company or standards body that designs a cryptosystem in secret, ignoring Kerckhoffs' Principle, produces something that is easily broken. Examples include the CSS encryption on DVDs [15], the WEP [16] encryption in wireless networking, and the A5 encryption in GSM cell phones [17]. Such problems are much harder to fix if the flawed designs are included in standards and/or have widely deployed hardware implementations; updating those is much more difficult than releasing a new software version.

Beyond the real difficulties in implementing real products are some systems that both get the cryptography horribly wrong and make extravagant marketing claims. These are often referred to as snake oil [18],

Legal issues involving cryptography

Prohibitions

Because of its potential to assist the malicious in their schemes, cryptography has long been of interest to intelligence gathering agencies and law enforcement agencies. Because of its facilitation of privacy, and the diminution of privacy attendant on its prohibition, cryptography is also of considerable interest to civil rights supporters. Accordingly, there has been a history of controversial legal issues surrounding cryptography, especially since the advent of inexpensive computers has made possible wide spread access to high quality cryptography.

In some countries, even the domestic use of cryptography is, or has been, restricted. Until 1999, France significantly restricted the use of cryptography domestically. In China, a license is still required to use cryptography. Many countries have tight restrictions on the use of cryptography. Among the more restrictive are laws in Belarus, China, Kazakhstan, Mongolia, Pakistan, Russia, Singapore, Tunisia, Venezuela, and Vietnam[19].

In the United States and most other Western countries, cryptography is legal for domestic use, but there has been much conflict over legal issues related to cryptography. One particularly important issue has been the export of cryptography and cryptographic software and hardware. See the next section.

There is an online survey of crypto law around the world.

Export Controls

Because of the importance of cryptanalysis in World War II and an expectation that cryptography would continue to be important for national security, many western governments have, at some point, strictly regulated export of cryptography. After World War II, it was illegal in the US to sell or distribute encryption technology overseas; in fact, encryption was classified as a munition, like tanks and nuclear weapons[20]. Until the advent of the personal computer and the Internet, this was not especially problematic as sgood cryptography was indistinguishable from bad cryptography for nearly all users, and because most of the cryptographic techniques generally available were slow and error prone whether good or bad. However, as the Internet grew and computers became more widely available, high quality encryption techniques became well-known around the globe. As a result, export controls came to be understood to be an impediment to commerce and to research.

In the 1990s, several challenges were launched against US regulations for export of cryptography. Philip Zimmermann's Pretty Good Privacy (PGP) encryption program, as well as its source code, was released in the US, and found its way onto the Internet in June of 1991. After a complaint by RSA Security (then called RSA Data Security, Inc., or RSADSI), Zimmermann was criminally investigated by the Customs Service and the FBI for several years but no charges were filed[21][22]. Also, Daniel Bernstein, then a graduate student at University of California at Berkeley, brought a lawsuit against the US government challenging aspects of those restrictions on free speech grounds in the 1995 case Bernstein v. United States which ultimately resulted in a 1999 decision that printed source code for cryptographic algorithms and systems was protected as free speech by the United States Constitution.[23].

In 1996, thirty-nine countries signed the Wassenaar Arrangement on Export Controls for Conventional Arms and Dual-Use Goods and Technologies, an arms control treaty that deals with the export of arms and "dual-use" technologies such as cryptography. The treaty stipulated that the use of cryptography with short key-lengths (56-bit for symmetric encryption, 512-bit for RSA) would no longer be export-controlled[24]. Cryptography exports from the US are now much less strictly regulated than in the past as a consequence of a major relaxation in 2000[19]; there are no longer many restrictions on key sizes in US-exported mass-market software.

In practice today, since the relaxation in US export restrictions, and because almost every personal computer connected to the Internet, everywhere in the world, includes a US-sourced web browser such as Mozilla Firefox or Microsoft Internet Explorer, almost every Internet user worldwide has strong cryptography (i.e., using long keys) in their browser's Transport Layer Security or Secure Sockets Layer stack. The Mozilla Thunderbird and Microsoft Outlook E-mail client programs similarly can connect to Internet Message Access Protocol or Post Office Protocol(POP) servers via TLS, and can send and receive email encrypted with S/MIME.

Many Internet users don't realize that their basic application software contains such extensive cryptography systems. These browsers and email programs are so ubiquitous that even governments whose intent is to regulate civilian use of cryptography generally don't find it practical to do much to control distribution or use of this quality of cryptography, so even when such laws are in force, actual enforcement is often lax.

NSA involvement

Another contentious issue connected to cryptography in the United States, is influence of the National Security Agency in high quality cipher development and policy. NSA was involved with the design of DES during its development at IBM and its consideration by the National Bureau of Standards as a possible Federal Standard for cryptography[25]. DES was designed to be secure against differential cryptanalysis[26], a powerful and general cryptanalytic technique known to NSA and IBM, that became publicly known only when it was rediscovered in the late 1980s[27]. According to Steven Levy, IBM discovered differential cryptanalysis[28] and kept the technique secret at NSA's request.

Another instance of NSA's involvement was the 1993 Clipper chip affair, an encryption microchip intended to be part of the Capstone cryptography-control initiative. Clipper was widely criticized for two cryptographic reasons: the cipher algorithm was classified (the cipher, called Skipjack, was declassified in 1998 after the Clipper initiative lapsed), which led to concerns that NSA had deliberately made the cipher weak in order to assist its intelligence efforts. The whole initiative was also criticized based on its violation of Kerckhoffs' Principle, as the scheme included a special escrow key held by the government for use by law enforcement, for example in wiretaps[22]. Also, Matt Blaze showed in 1994 [29] that the protocol was flawed and easily subverted.

Digital rights management

Cryptography is central to Digital Rights Management (DRM), a group of techniques for technologically controlling use of copyrighted material, being widely implemented and deployed at the behest of some copyright holders. In 1998, Bill Clinton signed the Digital Millennium Copyright Act (DMCA), which criminalized the production, dissemination, and use of certain cryptanalytic techniques and technology; specifically, those that could be used to circumvent DRM technological schemes[30]. This had a very serious potential impact on the cryptography research community since an argument can be made that virtually any cryptanalytic research violated, or might violate, the DMCA. The FBI has not enforced the DMCA as rigorously as had been feared by some, but the law, nonetheless, remains a controversial one. One well-respected cryptography researcher, Niels Ferguson, has publicly stated that he will not release some research into an Intel security design for fear of prosecution under the DMCA, and both Alan Cox (longtime number 2 in Linux kernel development) and Professor Edward Felten (and some of his students at Princeton) have encountered problems related to the Act. Dmitry Sklyarov was arrested, and jailed for some months, for alleged violations of the DMCA which occurred in Russia, where the work for which he was arrested and charged was legal.

References

  1. Menezes, AJ; PC van Oorschot & SA Vanstone (Fifth Edition, 2001), Handbook of Applied Cryptography, ISBN 0-8493-8523-7
  2. Diffie, Whitfield (June 8, 1976), "Multi-user cryptographic techniques", AFIPS Proceedings 4 5: 109-112
  3. David Kahn, "Cryptology Goes Public", 58 Foreign Affairs] 141, 151 (fall 1979), p. 153
  4. Cite error: Invalid <ref> tag; no text was provided for refs named dh2
  5. Rivest, Ronald L.; Adi Shamir & Len Adleman, A Method for Obtaining Digital Signatures and Public-Key Cryptosystems
  6. Clifford Cocks. A Note on 'Non-Secret Encryption', CESG Research Report, 20 November 1973.
  7. Bruce Schneier (2000). Preface to "Secrets and Lies".
  8. Bruce Schneier (2000). Secrets and Lies. John Wiley & Sons. ISBN 0-471-25311-1. 
  9. Ross Anderson. Why Cryptosystems Fail.
  10. Bruce Schneier. Why Cryptography Is Harder Than It Looks.
  11. Peter Gutmann (2002). Lessons Learned in Implementing and Deploying Crypto Software.
  12. Ross Anderson. Security Engineering. 
  13. Alma Whitten & J.D. Tygar (1999). "Why Johnny can't encrypt: a usability evaluation of PGP 5.0". USENIX Association.
  14. Hongjun Wu. The Misuse of RC4 in Microsoft Word and Excel.
  15. David Touretsky. Gallery of CSS Descramblers.
  16. Nikita Borisov, Ian Goldberg, and David Wagner. Security of the WEP algorithm.
  17. Greg Rose. A precis of the new attacks on GSM encyption.
  18. Bruce Schneier (February 1999). Snake Oil. Counterpane Inc..
  19. 19.0 19.1 RSA Laboratories' Frequently Asked Questions About Today's Cryptography
  20. Cryptography & Speech from Cyberlaw
  21. "Case Closed on Zimmermann PGP Investigation", press note from the IEEE.
  22. 22.0 22.1 Levy, Steven (2001). "Crypto: How the Code Rebels Beat the Government — Saving Privacy in the Digital Age. Penguin Books, 56. ISBN 0-14-024432-8. 
  23. Bernstein v USDOJ, 9th Circuit court of appeals decision.
  24. The Wassenaar Arrangement on Export Controls for Conventional Arms and Dual-Use Goods and Technologies
  25. "The Data Encryption Standard (DES)" from Bruce Schneier's CryptoGram newsletter, June 15, 2000
  26. Coppersmith, D. (May 1994). "The Data Encryption Standard (DES) and its strength against attacks" (PDF). IBM Journal of Research and Development 38 (3): 243.
  27. E. Biham and A. Shamir, "Differential cryptanalysis of DES-like cryptosystems", Journal of Cryptology, vol. 4 num. 1, pp. 3-72, Springer-Verlag, 1991.
  28. Levy, pg. 56
  29. Blaze, Matt (November 1994). Protocol Failure in the Escrowed Encryption Standard. 
  30. Digital Millennium Copyright Act