Stream cipher

From Citizendium
Revision as of 04:46, 13 October 2008 by imported>Sandy Harris (→‎Shift register stream ciphers)
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

A stream cipher is a cipher which encrypts data by combining the plaintext with the output of a keyed pseudorandom number generator to generate the ciphertext. To decrypt, run the generator with the same key to generate the same pseudorandom data, then reverse the combining operation to convert ciphertext back to plaintext.

Many stream ciphers use bitwise XOR as the combining operation. This is convenient since XOR is its own inverse, so encryption and decryption use exactly the same operations. However, any operation that has an inverse may be used. Some stream ciphers use bytewise (mod 256) addition to encrypt, subtraction to decrypt. Solitaire is intended to be used manually rather than on a computer; it uses addition and subtraction mod 26 to encrypt at letter level.

The hard part of stream cipher design is the pseudorandom generator. An enemy who knows or guesses some plaintext and has intercepted the matching ciphertext can easily get some of the pseudorandom data. If that allows him to infer the internal state of the generator, then he can break the cipher.

A one-time pad is in effect the ultimate stream cipher. It requires a truly random key as long as long as the data to be encrypted, which makes it impractical for many applications. Given such a key, however, there is no need to generate pseudorandom material and therefore no need to worry about the quality of the generator.

Stream ciphers from other primitives

Any pseudorandom number generator which accepts an initialisation value or key can be used to make a stream cipher. However, the stream cipher will be insecure if the key is too small (see cryptographic key) or if the generator design is not adequate in cryptographic terms. In particular, given some of the generator's output, it must be very difficult for the enemy to make inferences about previous output or later output, or to determine the internal state of the generator which would give him all future output.

Any stream cipher can also be used as a pseudorandom number generator. For example, RC4 is used in the OpenBSD random device. Going in this direction, there are fewer design issues. Any generator that is adequate for stream cipher use will be fine for any other application, provided it is intialised properly and re-initialised often enough.

Any block cipher can also be used to construct a stream cipher, either by running it in output feedback mode or by encrypting successive values of a counter.

Any hash algorithm can also be used to construct a stream cipher. You repeatedly hash some sort of buffer which initially includes the key and may include a counter. Typically, only part of the hash output is used as pseudorandom output, so that an enemy will not have enough data to determine the hash state. The excess hash output is often fed back into the buffer.

Shift register stream ciphers

Many stream ciphers are based on shift registers, usually linear feedback shift registers or LFSRs, though other types have been used. These are easily implemented in hardware and have been a common device in secure communications systems, especially military systems, for decades. They have been extensively analysed and the theory is well understood. Schneier gives much of the theory and a catalog of implementations [1].

In the simplest version of such a system, the shift register is initialised by loading the key into it. You then loop forever

 output the lowest bit of the register
 if that was a one, XOR a constant into the register
 shift the register right one bit

With a register of length L and a well-chosen constant, this outputs 2L-1 bits before it repeats. Longer periods are easily obtained; just use two or more registers whose lengths are mutually prime and XOR their outputs together. The period is then the product of the individual periods.

LFSRs alone, however are not effective. For an LFSR of length L, the Berlekamp-Massey algorithm can determine the register's internal state from 2L output bits. XORing the output of several LFSRs does not help much; that construction is provably equivalent to a single longer shift register, and Berlekamp-Massey breaks that. Various other simple tricks fail in a similar way.

However, techniques are available to avoid this difficulty. The most common is to use some non-linear function to combine LFSR-produced bits and generate the output.

Well-known stream ciphers

RC4

RC4, Rivest Cipher number four, was designed by Ron Rivest. It has a size parameter; the 8-bit version is in widespread use. This generates pseudo-random data one byte at a time and maintains a 256-byte internal state. The combining operation is XOR. The key can be any size up to the state size, 256 bytes or 2048 bits for the 8-bit version.

RC4 is quite simple to implement in software and is very widely used. Among other applications, it is used (at least as one option) in Internet protocols such as TLS (RFC 2246) for secure web browsing and SSH (RFC 4251) for secure remote login. It is also use in WEP wireless networking, in Microsoft PPTP and in many other applications.

Solitaire

Solitaire [1] was designed by Bruce Schneier and is used by characters in Neal Stephenson's novel Cryptonomicon. It is a manually operated stream cipher whose key is a shuffled deck of cards, designed for use by people who may not have access to a computer and do not want to risk being caught with incriminating cipher equipment. It uses arithmetic mod 26, convenient for 26 letters in the English alphabet and 52 cards in a deck.

Restrictions and weaknesses

A design problem for stream cipher implementations is how to maintain synchronisation; if the encrypting and decrypting devices get out of sync, generating different pseudorandom output, then communication is lost. In many applications, this is an easy problem but in some cases it can be quite difficult. Consider battlefield communications with noise everywhere and an enemy who actively attempts to disrupt your communications.

Output feedback stream ciphers — ones such as Solitaire in which the internal state changes with every output operation — are fragile in relation to this. Make a single error in counting the cards and all further encryption or decryption will be wrong. Moreover, there is no way to recover short of restarting the whole process from the beginning with another copy of the original key.

Reusing pseudorandom material

There are some usage restrictions. In particular, never encrypt two different messages with the same pseudorandom data (or even the same truly random data; see VENONA).

Using P for plaintext, C for ciphertext, R for (pseudo)random, and ^ for XOR this makes the encryptions:

C1 = P1^R
C2 = P2^R

The enemy can intercept both ciphertexts, so he does:

X = C1^C2 = P1^R^P2^R

and the Rs conveniently cancel out, so he has

X = P1^P2

This is very weak indeed. If the attacker knows or can guess one plaintext, he gets the other free. A zero byte in X means the corresponding bytes in P1 and P2 are equal. Other simple relations exist and can readily be exploited.

Given moderately strong knowledge of plaintext properties — for example, that it is English text — P1^P2 can be broken using techniques such as frequency counting that have been well-known for centuries and can be done with pencil and paper; you don't even need a computer. See History of cryptography for details. Even with weaker knowledge of the input format, it is still easily breakable.

Of course in a real application, such as VENONA, the attack may be far more difficult. Suppose you have a substantial archive of intercepted messages and you know or suspect that the enemy sometimes reuses the random material, There is still a huge amount of work to do to discover what was reused where.

On the other hand, the attack might be easy. In a well-publicised [2][3] case, Singaporean researcher Hongjun Wu found that Microsoft had misused RC4 in this way; when multiple versions of an encrypted Word or Excel file were saved, the same pseudorandom stream was used each time. The original paper is on the web [4]. In this case the attacker has more to go on since he knows the texts are multiple versions of the same document. Also, there may be more than two versions.

Rewrite attacks

Stream ciphers (including One-time pads) are inherently vulnerable to a rewrite attack. If an attacker knows some plaintext and has intercepted the matching ciphertext, then he can discover that portion of the pseudorandom data. This does not matter if the attacker is just a passive eavesdropper. It gives him no plaintext he didn't already know and we don't care that he learns some pseudorandom data. We will never re-use that data, for reasons given in the previous section, and if the pseudorandom generator is well designed, having some of its output does not give him an attack on it.

However, an active attacker who knows the plaintext can recover the pseudorandom data, then use it to encode whatever he chooses. If he can get his version delivered instead of yours, this may be a disaster. If you send "attack at dawn", the delivered message can be anything the same length -- perhaps "retreat to east" or "shoot generals". An active attacker with only a reasonable guess at the plaintext can try the same attack. If the guess is correct, this works and the attacker's bogus message is delivered. If the guess is wrong, a garbled message is delivered.

  1. Schneier, Bruce (Second Edition, 1996), Applied Cryptography: Protocols, Algorithms, and Source Code in C, John Wiley & Sons p 369