## Saturday, May 30, 2009

### How do you test a random number generator?

Human beings are notoriously bad at generating random numbers and sequences. If you ask someone to randomly choose a number between 1 and 20, the numbers they're most likely to choose are 7 and 17. Similarly, if you ask someone to simulate a series of coin flips by writing down a random string of the letters H and T, you'll see very few long sequences of either heads or tails, which would occasionally appear by chance in sequences of real coin flips.

People are just as bad a determining whether an existing sequence is random. Look closely at the following two images.

Which one do you think looks more randomly distributed? If you read the article that the images link to, you already know that most people say the first image looks more random because the points are more smoothly distributed. This is exactly the property that makes it less random than the second image, which has more instances of points in tight clusters. A real set of random data points wouldn't be as evenly spaced as the first image.

Computers are much better than people, but by no means perfect, at generating sequences of random numbers. A number or sequence is said to be random if it is generated by a non-deterministic and unpredictable process. Since computers are inherently deterministic machines, this means that no computer can ever algorithmically generate a sequence that is truly random. They can come very close, though, with the help of a class of algorithms known as pseudo-random number generators (PRNG).

For a programmer writing a PRNG, or software that relies on one, testing its randomness reveals a unique problem. Unit testing software compares the output of a function to an expected value. If the output is expected to be unpredictable, you have a testing dilemma.

To make matters worse, there is no way even theoretically to prove that a sequence was generated randomly. Luckily there are statistical methods that are useful for revealing when a sequence is not random. Let's take a look at a simple PRNG, then use a common statistical method, the Monte Carlo method, to compare its effectiveness to a standard software library implementation.

A Naive PRNG

One of the oldest algorithmic methods for generating pseudorandom numbers is called the linear congruential method. Here's a simple example implementation:

public class NaiveRandom {    private int x; // seed    private int m = 134456; // modulus    private int a = 8121;   // multiplier    private int c = 28411;  // increment    public NaiveRandom()    {        x = 0;    }    public NaiveRandom(int seed)    {        this.x = seed;    }    public double next()    {        x = (x * a + c) % m;        return x / (double)m;    }}

The original pseudocode and constants in the code above come from an example given in Statistical Mechanics by Werner Krauth. As Krauth explains in his book, these values are good for studying the algorithm, but not great if you need really random values. In the following section we'll see how it compares to Java's built-in PRNG.

The Monte Carlo Pi Program

Take a unit circle inscribed in a square.

The area of a circle is given by the formula

A = πr2

Since the radius is 1, the area is equal to π. Since the diameter of the circle and the length of a side of the square are both 2, the area of the square is 4. The ratio (which we'll call ρ) of the area of the circle to the area of the square is

ρ = π / 4 = 0.785398164

If we select a random sample of points from the square, the proportion of those points lying inside the circle should be equal to ρ. We can multiply this proportion by 4 for comparison to a library value of π. If the random number generator is close to truly random, then the closer our approximation should be to the actual value of π.

We can simplify the calculations necessary for the test if we only concern ourselves with one quadrant of the square. This is possible because it's the proportion of points inside the circle to those inside the square that matters. The proportion of such points is the same in each quadrant as it is in the whole. If we say that the center of the circle is at point (0, 0), then we need only generate random points (x, y) where x and y are both between 0.0 and 1.0. This is the standard output range for most PRNGs, like Java's Random class. Points that lie within the circle will be those that obey the inequality

x2 + y2 ≤ 1

Here is the Java code for a Monte Carlo Pi simulation.

import java.util.Date;import java.util.Random;public class MonteCarloPi {    public static void main(String[] args) {        // seed for NaiveRandom        Date now = new Date();        int seconds = (int)now.getTime();        // create random number generators        NaiveRandom nrand = new NaiveRandom(seconds);        Random rand = new Random();        // total number of sample points to take        int numPoints = 10000;        int inNaiveCircle = 0;        double xn, yn, zn;        // xn and yn will be the random point        // zn will be the calculated distance to the center        int inRandCircle = 0;        double xr, yr, zr;        // xr and yr will be the random point        // zr will be the calculated distance to the center        for(int i=0; i < numPoints; ++i)        {            xn = nrand.next();            yn = nrand.next();            xr = rand.nextDouble();            yr = rand.nextDouble();            zn = (xn * xn) + (yn * yn);            if(zn <= 1.0)                inNaiveCircle++;            zr = (xr * xr) + (yr * yr);            if(zr <= 1.0)                inRandCircle++;        }        // calculate the Pi approximations        double naivePi = approxPi(inNaiveCircle, numPoints);        double randomPi = approxPi(inRandCircle, numPoints);        // calculate the % error        double naiveError = calcError(naivePi);        double randomError = calcError(randomPi);        System.out.println("Naive Pi Approximation: " +               naivePi + ", Error: " + naiveError);        System.out.println("Random Pi Approximation: " +               randomPi + ", Error: " + randomError);    }    static double approxPi(int inCircle, int totalPoints)    {        return (double)inCircle / totalPoints * 4.0;    }    static double calcError(double pi)    {        return (pi - Math.PI)/Math.PI * 100;    }}

Results

Run the simulator several times and you will see that Java's built-in PRNG seems to outperform the naive implementation, but not by much. Neither performs particularly well, but Monte Carlo simulations are only expected to be close, not exact. Here are my results after ten runs of the simulation.

 Naive Random Java Random Pi Error(%) Pi Error(%) 3.122 -0.6237 3.1308 -0.3435 3.1516 0.3185 3.1332 -0.2671 3.1272 -0.4581 3.1352 -0.2035 3.1692 0.8788 3.1352 -0.2035 3.1524 0.3440 3.1416 0.0002 3.1168 -0.7892 3.1340 -0.2417 3.1724 0.9806 3.1040 -1.1966 3.1612 0.6241 3.1244 -0.5473 3.1448 0.1021 3.1428 0.0384 3.1052 -1.158 3.1628 0.6751

At least a small part of the imprecision of these results can be attributed to the fact that I only took 10,000 random points in each sample. A rule of thumb in Monte Carlo simulations is that for every 100X increase in data points, you'll get a 10X increase in the precision of your results. Since I took only 10,000 random points (100 * 100), I only got 2 digits of precision, with the third digit fluctuating.

Try increasing the number of random points to 1,000,000 and you should see that the third digit remains fixed over several runs of the program. The really interesting thing revealed is that our naive implementation continues to perform nearly as well as Java's built-in PRNG. This leads us to the conclusion (which can be easily verified) that Java's PRNG uses the same linear congruential method for generating random numbers as our naive implementation. The only significant difference between the two is that Java's implementation will be able to generate much longer sequences of random numbers before it starts to repeat itself.

Further Investigation

As I mentioned before, there are many statistical tests that can be used to measure the relative randomness of a sequence. I've only covered one of those tests in detail, but there are libraries available like the Diehard Battery of Tests of Randomness and ENT that include a wide variety of tests, and guidelines for interpreting them. If your application depends on randomness, I recommend evaluating your random number generator with one of these test suites.

## Thursday, May 21, 2009

### Rolling Sevens

What's the probability of rolling a 7 on one roll of two standard six-sided dice? To make it easier to visualize, here's a listing of all of the different outcomes of rolling two six-sided dice.

 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

If you look at the long diagonal starting in the bottom left, you can see that there are six ways out of a possible thirty-six to reach a total of exactly 7.

So for two six-sided dice the probability of rolling a seven is

P(7) = 6/36 = 0.1667

Or about one in every six tries.

That question was too easy, so let's make it a little bit tougher. What's the probability of rolling seven 7s in a row? Okay, it's not that hard if you know how to calculate the probability of independent events. You just need to multiply the probabilities of each event. In this case we want to multiply 0.1667 by itself seven times,

0.1667 * 0.1667 * 0.1667 * 0.1667 * 0.1667 * 0.1667 * 0.1667

or we can just raise it to the 7th power.

(0.1667)7 = 0.0000036

You can see that rolling seven 7s in a row has some pretty long odds. That will only happen about 36 times in 10 million tries!

But what if you've already rolled six 7s in a row? What's the probability of rolling a seventh 7? Before we do any calculations, would you say that the odds are pretty good or pretty bad of getting a seventh 7?

If you think the probability is extremely bad, you're falling for a very common cognitive bias called the Gambler's fallacy. This fallacy accounts for the belief that a streak of good or bad luck is "due" to change. A gambler who has had a long streak of losing hands in blackjack might decide to increase his bet, banking on the feeling that his luck is about to change.

The Gambler's fallacy arises from out intuitive sense that things have a way of "evening out" in the long run. The truth is that over the very long run, things do even out. This is called the Law of large numbers. If I flip a coin a million times, I can expect to get reasonably close to half-a-million heads. But if I flip the same coin only ten times and get ten heads in a row, the probability of seeing a head on the next flip is still exactly one-half. The Law of large numbers only applies to the average of a large sample, not to individual coin flips, hands of blackjack, or rolls of the dice. The coin (or the cards, or the dice, or the universe for that matter) doesn't remember the result of the previous trials, and therefore cannot be influenced by them.

Oh, I almost forgot, the probability of getting that seventh 7 after rolling six in a row is 0.1667, exactly the same odds as getting a 7 on any other roll. The dice have no memory.

## Wednesday, May 6, 2009

### Programming and Experimentation

Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.
-- Chinese Proverb
In a recent post, Great Developers Are Not Afraid to Experiment, Scott Selikoff talks about the fear and insecurity felt by many inexperienced software developers. Scott talks about his experience as a moderator over at JavaRanch, citing the frequency of questions asking "What would happen if I executed the following code?"
Many times the author of such posts can answer the question by copying and pasting his/her code into a Java main() method and running it. Some might chalk these posts up to being lazy, but, clearly, taking the time to write a post on a message board - often signing up as a member for the first time - takes some amount of effort as well. With that, I’m going to be go with the assumption developers avoid experimenting with code because they are scared or unsure of their own knowledge.
Back in college I took a job tutoring students in the introductory computer science courses and I came to the same conclusion that Scott does in his article. I would frequently have students bring code to me and ask me what I thought of it. Some would even go so far as to ask me if I thought their code would compile. I would never answer this question directly (despite Head First Java repeatedly urging me to "Be the Compiler"), but would instead patiently show them how to answer it for themselves. These students weren't lazy, they were scared. In many cases it seemed like they were more scared of being wrong than they were of not knowing the answer. They hadn't learned to experiment yet.

It's important to understand this fear if you're in any kind of position to help people get past it, whether you're a teacher, a writer, a mentor, or just a guy who answers a lot of questions on a programming Q&A web site. If you're a really good programmer, one of your key personality traits is probably an innate sense of curiosity (I'm generalizing, but take a look around at the good programmers you know and see if you disagree). You probably couldn't stop experimenting if you tried. For those of us who have the curiosity trait, it can be puzzling when we encounter someone who lacks it.

Luckily, curiosity is contagious. If you show someone how experimentation works in programming, and you're enthusiastic about learning with them, they might catch the bug from you. I had very few students who were disappointed that I wouldn't just tell them the answers to their programming assignments. Most of them wanted to learn how to do it for themselves once they were convinced that they could.

If you are a teacher or a mentor, the best thing you can do for your student is show them how to find answers on their own. Teach them how to find out what a compiler warning means (don't tell them what it means, even if you know). Teach them how you'd write a test to exercise a tricky piece of code. Teach them how the debugger works.

As the proverb says, "Teach a man to fish..."

## Sunday, May 3, 2009

### A Brief History of Cryptography

Cryptography, from the Greek root words kryptos and gráphō, together meaning "hidden writing", has existed almost as long as the written word. Down through the ages, the advantage in the battle between cryptologists (code makers) and cryptanalysts (code breakers) has changed hands many times. The following events are some of the most important developments in the history of cryptography.

5th century BC - Spartan generals exchanged secret messages by wrapping narrow ribbons of parchment around a cylindrical staff known as a scytale, then transcribing their messages on the papyrus. The messages could only be read when the papyrus was re-wound around a scytale of the same thickness. This was the earliest recorded use of what is today known as a transposition cipher.

2nd century BC
- Greek historian Polybius developed one of the earliest recorded substitution ciphers by replacing the letters of the alphabet, arranged in a Polybius square, with numbers.

1st century BC - Roman generals used a simple shift cipher, where each letter of a plaintext message would be shifted a fixed number of letters down the alphabet to produce the ciphertext. The cipher came to be known as Caesar's cipher after Julius Caesar, who is said to have preferred a shift of three letters.

9th century - Islamic mathematician Abū Yūsuf Yaʻqūb ibn Isḥāq al-Kindī published the first text book on code breaking, A Manuscript on Deciphering Cryptographic Messages. Al-Kindī's book introduced the classification of ciphers, polyalphabetic ciphers, and frequency analysis, an important technique used in breaking substitution ciphers. Frequency analysis uses the relative frequency of symbols in a coded message to reveal what letters of the alphabet they stand for.

1553 - Italian Renaissance man Leon Battista Alberti published the first clear description of what would later come to be known as the Vigenère cipher* in his book La cifra del. Sig. Giovan Battista Bellaso. The Vigenère cipher is a strong polyalphabetic substitution cipher consisting of several Caesar cipher's in sequence, each with a different shift value.

Enciphering was done with the aid of a table of alphabets, known as a tabula recta, Vigenère square, or Vigenère table (see the image at left), or with a cipher disk. To encode a message, a keyword was chosen and the letters in the keyword determined which rows of the Vigenère table were used to encode each letter of the message. For example, if the keyword ACE were chosen, the A row would be used to encode the first letter of the message, the C row to encode the second letter, and the E row to encode the third. The letters of the keyword would then be repeated to encode the fourth, fifth, sixth, and subsequent letters of the original message. To decode a message, the same process was done in reverse. This required sender and receiver to agree on a secret key before communications could be sent.

1863 - The resistance of the Vigenère cipher to frequency analysis earned it a reputation as an "unbreakable" cipher, but this would prove not to be the case. The cipher was shown to be vulnerable to an attack that would come to be known as Kasiski examination**.

The weakness of the Vigenère cipher was that certain common words, such as "the" and "and", are used with such frequency that they will occasionally be encrypted as the same sequence of letters in different parts of the encrypted text. By looking for repeating sequences of letters, cryptanalysts could divine the length of the key used to encrypt a message. Once the length of the key was known, the ciphertext message could be partitioned into key-length chunks of text, which were shown to be individually susceptible to frequency analysis.

1917 - U.S. Army Major Joseph Mauborgne developed an encryption system known as a one-time pad. The idea is similar to encrypting a message using a secret key, but in the case of a one-time pad the secret key is a random sequence of numbers as long as the original message, and it is used only once. The one-time pad, if implemented correctly, is the only provably unbreakable encryption algorithm.

The chief weaknesses in implementing a one-time pad are generating a random sequence of digits of sufficient length and randomness, and distributing the keys. Both weaknesses were exploited by both sides during World War II, and during the subsequent cold war between the United States and Soviet Russia.

1926 - After World War I had ended, the German Navy adopted the use of a family of electro-mechanical encryption devices that became collectively known as the Enigma machines. The devices proved to be so effective that by World War II their use had spread to every branch of the German military.

The Enigma machines consisted of a mechanical keyboard, a set of disks attached to a rotating spindle, and a stepping mechanism to turn one or more of the disks each time a key was pressed. When a key was pressed, a set of electrical contacts in the disks would complete a circuit. The arrangement of the disks would provide the encryption for the letter selected. By rotating the disks with each key press, the Enigma generated a different encryption for each letter of a message. Rotors with different encodings could be added or exchanged on some models of the Enigma, and some featured plug-boards that allowed the operator to reconfigure the wiring of the rotors, providing even more combinations of possible ciphers.

The initial arrangement of the rotating disks and the patch cords on the plug board acted as the key to the Enigma machines. A message encoded starting from one unique initial state could only be decoded by an Enigma machine starting in the same configuration. The enormous number of possible initial states, as many as 10114 for some models, made the Enigma extremely difficult to crack. Although still not quite as secure as a one-time pad, the speed with which a skilled operator could encode and decode a message with the Enigma machine made it preferable for field use.

1943 - A group of Allied code breakers working in the town of Bletchley Park in the United Kingdom, built one of the first programmable digital electronic computers, code named Colossus, in an effort to break German codes. Intercepted encrypted messages could be read in by the Colossus at a high rate of speed on a paper tape. These data streams were then compared to various messages generated internally by the Colossus, which was programmed to simulate German code making machines. Any matches in these two data streams were used by Allied code breakers to decipher the intercepted messages.

The Colossus was classified as top secret by the British military, and details about its construction and operation weren't released until decades after its use. The Bletchley Park code breakers, with the aid of Colossus, are estimated to have shortened World War II by several months.

1972 - The U.S. National Bureau of Standards (NBS, known today as the National Institute of Standards and Technology, or NIST) identified the need for a government-wide data encryption standard. The NBS solicited proposals for a cipher that would meet their rigorous requirements, and after several candidate ciphers were refused, a group of cryptographers working out of IBM published what would become known simply as DES in 1975. After much public scrutiny, DES was accepted as a standard in November of 1976.

The chief criticism of DES was the allegation of tampering by the National Security Agency (NSA). DES is a block cipher, and block ciphers work by encrypting a set-length block of bits at a time according to the length of a key. The key length directly influences the level of security of the encryption. On recommendation from the NSA, the IBM researchers agreed to shorten the DES key length to 56 bits and slightly modify the encryption algorithm. The fear at the time was that IBM had allowed the NSA to weaken DES to the point that the NSA (and only the NSA) could break the cipher.

According to security expert Bruce Schneier, fears surrounding the NSA's tampering with DES were unfounded. In his 2004 article The Legacy of DES*** Schneier wrote:
It took the academic community two decades to figure out that the NSA "tweaks" actually improved the security of DES. This means that back in the '70s, the National Security Agency was two decades ahead of the state of the art.

1976 - Whitfield Diffie and Martin Hellman published a cryptographic protocol that allowed secret key exchange over an unsecured communications channel****. The protocol, dubbed Diffie-Hellman key exchange, relies on the mathematical properties of one-way functions, specifically in this case the discrete logarithm problem (RSA, a later key exchange algorithm, relies on integer factorization).

1989 - The first working experimental quantum cryptographic prototype was demonstrated by researchers from the U.S. and Canada. A fundamental property of quantum mechanics, known as the Heisenberg uncertainty principle, is that anything that attempts to measure a quantum system disturbs the system. This property can be used to detect eavesdroppers on a specially designed fiber-optic network. A stream of photons representing a cipher key can be sent across a fiber-optic link. If the stream is disturbed by the presence of an eavesdropper, the link can be immediately shut down before any sensitive information is exchanged. If the stream is not disturbed, then the key exchange is complete and the key can be used to encrypt data that can then be transmitted by normal means.

1998 - The Electronic Frontier Foundation demonstrated a computer custom-designed to perform a brute force attack on DES. When DES was accepted in 1976, there were no machines powerful enough to break the algorithm's 56-bit encryption key in a reasonable amount of time, at a reasonable cost to build. The EFF's machine, known as Deep Crack, demonstrated that DES encryption was no longer sufficient. Deep Crack, which cost the EFF \$250,000 to build, was able to search the entire 256-bit DES keyspace (about 72 quadrillion possible keys) and decrypt a DES encrypted message in just 56 hours.

2001 - The Advanced Encryption Standard (AES) was announced as a replacement for DES. AES is a collection of block ciphers, each with a 128-bit block size, and with key lengths of 128, 192, and 256 bits. The increased block and key sizes make AES much more secure than its predecessor. AES is the only publicly accessible and open encryption standard approved by the NSA for top secret government use.

2008 - The world's first commercial network using quantum cryptology was demonstrated by participants from twelve European countries. The designers of the 200 km network in and around Vienna, Austria plan for it to be used for secure financial transactions and information exchange.

The future - The advantage in cryptography has shifted in favor of the code makers, but who knows for how long? Advancements in quantum computing or a major breakthrough in mathematics (in the area of quickly factoring large numbers) could provide the means to crack the best codes we use today. Who knows how long our digital communications will remain secure?

In researching for this article it didn't take me long to happen upon the Wikipedia article History of cryptography. Anyone who enjoys this article will almost certainly enjoy that one as well.

Much of the material covered in this article is discussed in greater detail in Simon Singh's excellent The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography.

For information about the recent (1970's and later) history of cryptography, and the people who created it, read Steven Levy's Crypto:how the code rebels beat the government-saving privacy in the digital age.

More information regarding the European quantum cryptographic network can be found in the BBC News article 'Unbreakable' encryption unveiled.

Notes:
* The Vigenère cipher got its name from Blaise de Vigenère, who published a stronger version of the cipher in 1586.

** First published by Friedrich Kasiski in 1863, but independently discovered by Charles Babbage in 1846.

*** My thanks to Tim Kington for directing me to Schneier's article on DES.

**** Although Diffie and Hellman were the first to publish a protocol for public key exchange, it was later revealed that GCHQ, the British signals intelligence agency, had developed a similar protocol prior to 1976, but had kept it classified.