February 01, 2004

Software security for developers: One-time pads

Software security for developers: One-time pads

1 October 2000
In this installment, Gary and John examine the one-time pad, which, if used properly, is an unbreakable encryption algorithm. The one-time pad has proven popular with spies, and its history predates World War II -- but does it actually make sense for real software applications?
Is there really such a thing as a perfect encryption algorithm? In the theoretical sense, yes. One-time pads are unbreakable if used properly. However, such algorithms aren't used much in practice. In this article, we'll talk about the one-time pad, its strengths and weaknesses, and why other encryption algorithms exist.

Cryptography is sometimes more of an art than a science. For example, we can't really answer a seemingly simple question like "How secure is the Rijndael algorithm?" Because Rijndael uses a 128-bit (or more) key, we can throw around the number 2128 . But in reality, no one really knows how secure Rijndael is, because it is incredibly difficult to prove security guarantees about algorithms. Proving no attacks could be successfully launched on any given algorithm is nearly impossible. One difficulty is that new cryptographic attacks are periodically discovered. But things become even more difficult when we consider that what we really want to prove is that a cipher can withstand any future attacks, including ones that won't be discovered for centuries.

In practice, commonly used ciphers that are believed to be secure may or may not actually be secure. We gain confidence in a cipher because no one has broken it yet, not because it is known to be secure. That's why cryptographers are more likely to recommend ciphers that have been extensively scrutinized as opposed to newer or proprietary cryptographic algorithms. The working theory is that if a whole lot of smart minds couldn't break an algorithm, then it's more likely to be secure than one that nobody has looked at carefully. The more minds on a problem, the more likely we are to trust the algorithm. That means we end up talking about the "best case" security properties of an algorithm. With symmetric key ciphers, the "best case" is more often than not related to the size of the key space -- unless there's known to be a better attack than trying every possible key until the data is unveiled (the brute force attack). The upshot of all this may be a bit unsettling: All symmetric key algorithms can be broken, given enough time. Ultimately, we pin our hopes on the idea that no one will ever have enough time to crack our code! For example, if we area using an algorithm with a 256-bit encryption for which there is no more effective attack than mere brute force, then we can expect that the odds are incredibly slim that an attacker can break the key within the lifetime of the universe.

It's easier to prove things mathematically about public key algorithms than symmetric algorithms. That's because public key algorithms tend to be based on well-defined mathematical problems (such as factoring composites made of primes). By contrast, symmetric algorithms tend to be ad hoc collections of substitutions and permutations that are difficult to analyze in context. So, assuming a perfect implementation, the difficulty of breaking the RSA algorithm is known to be directly tied to the difficulty of being able to factor very large numbers. Mathematicians believe that factoring large numbers cannot be accomplished in reasonable time. Unfortunately, so far, no one has been able to prove that factorization is really that difficult.

Even if factorization were known to be hard enough (and assuming there are no other implementation or design problems), there is still some theoretical limit at which RSA is breakable, as is the case with symmetric algorithms. If you could wait around while you were trying out keys for the lifetime of the universe and then some, you could definitely crack RSA keys!

One-time pads
Perfect data secrecy may not seem possible, particularly if potential attackers are given absurd amounts of time (and can run brute force searches in parallel). But, oddly enough, there is an encryption algorithm that can't be broken if used properly: the one-time pad. What is even stranger is that the algorithm is incredibly simple.

The basic idea behind a one-time pad is that there's as much key material as there is text. The encryption operation can be simple modular addition. In computer-based uses, it is often XOR.

Here's the simple algorithm for one-time pads: For each plain text message, generate a random secret key. The key should be exactly the same length as the plain text message. The cipher text is created by simply XOR-ing the plain text with the key.

As an example, let's say that we want to encrypt the message:

"One-time pads are cool."

The plain text message can be encoded in 8-bit ASCII, yielding the following hexadecimal representation:

4f 6e 65 20 74 69 6d 65 20 70 61 64 73 20 61 72 65 20 63 6f 6f 6c 2e

Let's say our random key is represented in hex as follows:

9d 90 93 e7 4f f7 31 d8 2d 0d 22 71 b6 78 12 9d 60 74 68 46 6c c0 07

After XOR-ing, we end up with the following cipher text:

d2 fe f6 c7 3b 9e 5c bd 0d 7d 43 15 c5 58 73 ef 05 54 0b 29 03 ac 29

Because every single bit in our plain text is encoded by a single unique bit in the key, there is no duplicate information that a cryptanalyst can use in an attack. Because the key is randomly chosen, we are just as likely to end up having encoded the plain text stream shown above as we are to have encoded:

86 96 93 e7 49 f7 33 c9 2d 1f 26 72 ac 36 00 cf 64 20 2b 46 6d c9 07

which decrypts to, "the riot begins at one." Because of the one-to-one correspondence of bits, a cryptanalyst can't gain any information that makes any one decryption more likely than any other, assuming that encryption keys are never reused.

The C code below can be used to encrypt and decrypt using a one-time pad. The plain text is modified in place. Thus, it's a bad idea to pass in constant strings.


void otp(char *text, char *key, int len) {
int i;
/* XOR sizeof(int) bytes at a time (4 on most machines). */
for(i=0;i /* If the number of bytes isn't divisible by sizeof(int), XOR the rest.*/
i = len%sizeof(int);
while(i){
text[len-i] ^= key[len-i];
--i;
}
}

What's the catch?
One-time pads sound so good that you might be surprised to find out that they're incredibly uncommon to find in the field. There are actually several reasons why this is the case.

First, distributing keys securely is extremely difficult, especially when they have to be as long as a message will be. If you want to send 12 GB of data to a friend over an untrusted medium using a one-time pad, you need a key that is 12 GB long! That's a huge key.

Second, and more importantly, the cipher requires numbers that are absolutely unpredictable. Of course, rand() and its ilk are completely out of the question, because they're utterly predictable.

We've gone on at length about how difficult it is to get high-quality random numbers, especially at a high bandwidth. Imagine having to procure 12 GB of random data! If you try to get that much data from /dev/random, you're likely to have an incredibly long wait in store. A hardware source of randomness is far more practical.

You might ask, why not use /dev/urandom, because it spits out numbers as fast as it can? The problem is that it generates numbers quickly using a cryptographic mechanism. In the end, you're effectively reducing the strength of the encryption from unbreakable to something that is probably no more difficult to break than the best symmetric key algorithms available. So why not use a symmetric algorithm in the first place? If you do, you'll end up with a shorter key.

Some history
One-time pads are older than the computer, dating back to 1917. They were occasionally used for highly secure communications during World War II. The "pad" in the name "one-time pad" comes from the original form factor of the cipher: Key material was printed on thick pads, one letter per page.

Number stations


There are a lot of radio "stations " like this that send encrypted messages. They're called "number stations" by amateur enthusiasts who are fascinated by them.

One-time pads are still in limited use today. Many governments use short-wave radio to communicate encrypted messages to spies in the field. What happens is that agents have short-wave receivers, and know what station to tune to, and when. The radio station broadcasts little more than an automated voice reading out a list of numbers. The numbers are broken up in such a way that it's clear there are multiple messages, each with header information about the message. Such information generally includes the length of the message, and the intended recipient. The spy listens to the first group of numbers, to figure out if there is a message waiting for him. If so, he copies the cipher text of his message, and figures out where in his copy of the one-time pad he should start based on the header information. He then decodes the message, and finally destroys the message, along with any used pages in the one-time pad.

The encryption algorithms used in military and government applications tend to be slightly different from what we'd do on a computer, but equally effective in practice. We'll give an example algorithm, assuming that messages only consist of the 26 possible letters from A to Z. The pad would be filled with numbers from 1 to 26, and might have as few as one number per page, but more often would have four or five numbers per page. To encrypt the first letter of a message, the numerical equivalent (for example, "A" = 1, "Z" = 26) was added to the first value shown on the pad. If the number turned out to be greater than 26, then 26 was subtracted from the value. The result was converted back into a letter, then written down. This was done for each number on the page, and then the top page of the pad was then torn off and destroyed. The process was then repeated for the second group of letters, and so on. Depending on the organization, pads could contain as few as one number per page. Usually they contained sets of five numbers per page. All in all, encoding messages this way is tedious work.

Decrypting the message required an exact duplicate of the pad used to encrypt the message. Starting on the first page of the pad, the first letter of the cipher text was converted to a number, and subtracted from the number on the pad. If the result was zero or less, 26 was added to the result. The number was then converted back to a letter. (This simple algorithm always yields the same letter as the first character of the original plain text.) The top page of the pad was used up, then torn off and destroyed. Decryption then proceeded to the second page of the pad. Again, tedious work.

As an example, consider our message, "One-time pads are cool." We would have to write it as:

O N E T I M E P A D S A R E C O O L

Converted to numbers, our message is:

15 14 05 20 09 13 05 16 01 04 19 01 18 05 03 15 15 12

Let's say that the top 18 numbers on our pad are:

02 15 18 24 02 14 24 09 20 14 09 10 01 17 19 02 19 13

For the first letter, we take 15 and add 2 to get 17. "Q" is the 17th letter of the alphabet, and is thus our first character of cipher text. For the second letter, we add 14 and 15 to get 29. Because this number is greater than 26, we subtract 26, giving us 3, which converts to the letter "C". Ultimately, we end up with the following cipher text:

Q C W R K A C Y U R B K S V V Q H Y

To decrypt this message, we first convert the cipher text back to numbers:

17 03 23 18 11 01 03 25 21 18 02 11 19 22 22 17 08 25

We start with a pad that's identical to the one used to encrypt the message. Therefore, the first number on the pad should be 2. We subtract 2 from 17, getting 15. We convert 15 back into the letter "O." Next, we take 3, the second number in the cipher text, and subtract 15 from it, because 15 is the second number on the pad. The result is 12. Because the result is less than 1, we add 26 to it, giving 14. The 14th letter is "N." Proceeding that way, we eventually recover the entire message.

Pad generation
Generating the pads is another challenge. Using the text of War and Peace as the basis for a pad is a bad idea. Because the key isn't random, the cipher text won't be random. One technique that has been used is the same one favored by lotteries to this day. Numbered balls are placed in a closed chamber, and air blows the balls around. An operator uses this chamber to create two identical pads. For the first character on the pad, the operator reaches into the chamber without looking, and extracts one ball. The number on that ball is recorded on both pads, then the ball is returned to the chamber. Then, the process is repeated. Tedious work.

Even techniques like ball picking haven't proven very practical in the real world. In fact, the difficulty of generating and distributing pads was a major reason why hundreds of less secure cryptographic codes were used during World War II.

One-time pads evinced a plethora of problems when they were used in the field. First, the pads had to be synchronized between two communicating parties. Consider what happens when Alice sends a message to Bob at about the same time as Bob sends a message to Alice. They might both use the same key for encryption. If each party follows the algorithm precisely, then neither would have the right pad available to decrypt either one of the messages!

Worse, encrypting two messages with the same key compromises the security of the algorithm. If someone were to guess that two messages were encrypted using the same pad, that person might be able to recover both of the original messages. This risk was often realized. Lazy soldiers occasionally did reuse pads, and motivated cryptanalysts did notice, breaking messages that the communicating parties probably believed were perfectly secret.

To solve the problem, it makes sense to give Alice and Bob each a set of two one-time pads. One set of pads is earmarked for messages from Alice to Bob, and the other for messages from Bob to Alice. Now, as long as pads are not reused, and as long as pads are destroyed after use so that no one can compromise used pads, everything is secure.

Unfortunately, there is still another problem -- data integrity. Let's say that Alice sends a message to Bob at 10:00 a.m., then another message at 1:00 p.m., and finally a third message at 5:00 p.m.. If the 10:00 a.m. message never arrives because the messenger was shot, it would be very difficult for Bob to decrypt the two later messages. It wouldn't be impossible, but it would be a major inconvenience, because Bob would have to guess where in the pad to start when he is trying to decode the 1:00 p.m. message. One way to solve the problem would be to number the messages, and use only one pad per message. Another solution would be to have one pad per day; then only one day of communications could be disturbed at a time.

Conclusion
The lessons of World War II cryptographers are just as valuable today as they were back then. The most important lesson is that one-time pads aren't very convenient. In practice, it is usually worth the risk to trade perfect secrecy for usability.

One-time pads are also easy to get wrong. Remember that the pads must be used once and only once, thus the “one-time” in the name. And the key data must be absolutely unpredictable.

In practice, most people aren't willing to exchange huge keys in a secure manner. It would be particularly horrible to have missing data somewhere in the middle of a large data stream. If the stream could be missing anywhere from 1 byte to 1 GB, resynchronization won't be all that feasible. However, if after all the caveats, one-time pads still sound like a good idea to you, we recommend the following strategy:

Have a high-quality hardware random number generator produce random bits.
Hash those bits with a cryptographic hash function, to get rid of any latent patterns that might be in the data source. For example, you might hash with SHA-1. Take 160 bits from the generator at a time and hash it to get 160 bits of output from SHA-1.
Store those bits on two CDs, until you fill the CDs. Both CDs should be identical.
Distribute and use the CDs.
Make sure new CDs are available when more data are needed. Reusing the same CD can lead to compromise.
Destroy the CDs when done.

Resources

See Beating the bias, Gary and John's article (here on developerWorks) that explains how to devise a reasonably secure random number generator through hardware.


Find out more about reliable encryption technologies in Tried and true encryption here on developerWorks.


Several sites on the Internet provide more information on the use of encryption in espionage, such as http://www.spynumbers.com and http://www.geocities.com/Area51/Dimension/1087.


The developerWorks columns by Gary and John have been updated and expanded in the book Building Secure Software, which also includes plenty of new material. Pick up a copy today -- your users will thank you.

About the authors
Gary McGraw is the vice president of corporate technology at Cigital (formerly Reliable Software Technologies) in Dulles, VA. Working with Consulting Services and Research, he helps set technology research and development direction. McGraw began his career at Cigital as a research scientist, and he continues to pursue research in software engineering and computer security. He holds a dual Ph.D. in Cognitive Science and Computer Science from Indiana University and a B.A. in Philosophy from the University of Virginia. He has written more than 40 peer-reviewed articles for technical publications, consults with major e-commerce vendors including Visa and the Federal Reserve, and has served as principal investigator on grants from Air Force Research Labs, DARPA, National Science Foundation, and NIST's Advanced Technology Program. He can be reached at gem@cigital.com.

John Viega is a senior research associate, Software Security Group co-founder, and senior consultant at Cigital. He is the principal investigator on a DARPA-sponsored grant for developing security extensions for standard programming languages. John has authored over 30 technical publications in the areas of software security and testing. He is responsible for finding several well-publicized security vulnerabilities in major network and e-commerce products, including a recent break in Netscape's security. He is also a prominent member of the open source software community, having written Mailman, the GNU Mailing List Manager, and, more recently, ITS4, a tool for finding security vulnerabilities in C and C++ code. Viega holds an M.S. in Computer Science from the University of Virginia. You can contact him at viega@cigital.com.

February 1, 2004 at 11:16 AM in Security US | Permalink | Top of page | Blog Home