Tuesday, December 31, 2013

The Divine Random in Computing

We'll find all its secrets and make them our own;
A yarn to weave and be woven.
We'll choose among the least likely chosen
And build a castle stone by stone.

Alan Turing - Superhero
(Courtesy technewspedia.com)
Hollywood seems enamored these days with the superhero story. Movies about Thor, Spiderman, and Superman, all have basically the same plot. In his youth, our hero discovers a unique ability, a superpower, which he then uses to save the world from an evil arch-villain in violently dramatic fashion. Later, the very people whose lives he spared begin to fear his superpower and persecute him for his eccentricities. The superhero becomes despondent and, eventually, turns to the only being who has the means to end his suffering — himself.

Of course, there are no superheroes in real life, but there was a man whose biography pretty much mirrors the superhero story — the British mathematician Alan Turing. Turing discovered his superpower, a genius for mathematical logic and computing, at the age of twenty-four when, as a graduate student, he published a paper entitled, "On Computable Numbers, with an Application to the Entscheidungsproblem.” I'll have more to say about the ideas in this paper later, but suffice to say that Turing's result is one of the most profound triumphs of rationality. Turing took his superpower into World War II and used it to battle the Nazis. He designed a computer called the "Bombe" that he and his team used to crack the German Enigma Code, thus freeing Allied shipping and supply lines from the dreaded U-Boats and saving England from the arch-villain Adolph Hitler.

Replica of Turing's 'Bombe' Code Breaker
(Courtesy Wikimedia)
After the war, Turing returned to his studies, published his dissertation, and initially received well-deserved gratitude at home and abroad. Ultimately, as the superhero plot dictates, his fellow citizens became suspicious of Turing's research and perhaps a bit jealous of his intellect. More troubling for Turing, the Brits, whom he had helped rescue from Fascism, were creeped out by his sexual preferences — Turing was gay. The British courts convicted him of gross indecency and sentenced him to a regimen of hormone injections designed to completely suppress his sex drive, not knowing what side-effects that might cause. Turing, a fan of fairy tales like Sleeping Beauty, became depressed, and one morning in 1954 (coincidentally the year of my birth) his housekeeper found him dead with a partially eaten, cyanide-laced apple on his bedside table.

Turing's remarkable 1936 paper defined both the beginning and the end of a new branch of mathematics — computation theory. In the first third of the paper, he formally defined a simple but robust computer now called a Turing Machine (TM). In the second third of the paper he proved that his little machine is universal, that it is capable of performing any computation any other computer can perform. In fact, any machine that can reasonably be called a computer, past or future, laptop or supercomputer, is no more powerful than a simple TM. Modern computers may be and generally are faster in performing some computations, but however elaborate they may be, a TM can accomplish the same thing, eventually. Turing's goal in this was to find an intuitive but formal definition of the phrase "by mechanical means," a phrase that is used quite a lot in mathematics and science.

If Turing had stopped and published the first two thirds of his paper, it would still have been an incredible achievement. It provides us with a universally applicable, easy to understand definition of what we mean by the word "algorithm", an orderly, step-by-step, procedure for doing something like verifying the proof of a mathematical theorem. Turing conjectured that we mean a computer program, specifically a Turing Machine, and today most mathematicians agree with his conjecture. Any effective method for doing something, any method that does not require ingenuity or creativity, exists if and only if there is a TM whose steps encode that method. That's an amazingly general concept which is widely known as the Church-Turing Thesis (Church was Turing's teacher, among many other accomplishments).

But Turing didn't stop after just defining what computation means in a formal sense. The final section of his paper went on to prove there are problems, infinitely many of them, that are not soluble by any TM, and therefore no deterministic procedure exists for finding all the answers. In particular, he proved there are infinitely many real numbers whose digits cannot be computed by any TM, or by any mechanical means whatsoever. I wish I could give you a concrete example of such an uncomputable number, but I cannot, and no one can. Turing proved they are out there, but to provide a finite recipe for producing one of them is fundamentally impossible! It turns out to be quite difficult to identify individual objects floating in the sea of randomness, and when we do find them they're often fuzzy and difficult to crisply discern.

Though we are unable to provide a recipe to produce all the digits of any given uncomputable number, Turing did define a specific problem that he proved could not be solved by any algorithm — the Halting Problem. Is it possible to write a compiler that, given the source code and input data of any other program, will tell us whether the program will ever halt on that input data? The answer is, no such program can possibly exist because it would lead to a logical paradox akin to the "everything I say is a lie" conundrum. Much has already been written about the halting problem, so I won't dwell on it here. It will, however, come up again when we talk about randomness in mathematics.

Our working, informal definition of a random number is any number that is (i.e. whose digits are) universally unpredictable. We've seen that nature provides copious examples of quantities whose measures at the nano-scale are random in this sense. Turing's result shows there are numbers that cannot be computed by any finite algorithm and the Church-Turing Thesis allows us to extend that to mean such numbers cannot be produced by any effective procedure whatsoever. In other words, the shortest, most compact way to describe such a number is to simply list all its digits!

Now you may think these unnameable, unspeakable numbers are rare and difficult to find, and you'd be half right about that. But they are most assuredly not rare. In fact, if you threw a very, very sharp dart and hit the real number line, the probability of hitting such an unnameable number is 1. There is a countably infinite number of nameable numbers, but an uncountably infinite number of unnameable ones. Gregory Chaitin, an American mathematician and computing theorist whom we will talk more about in the chapter on randomness in mathematics, has a name for these real numbers whose shortest possible exact descriptions are the numbers themselves. He calls them algorithmically random, but admits the concept may actually be the best definition of randomness in all its generality.

In physics, we began with the ambition to discover a small set of general principles with which all the universe could be explained, only to find that most of the universe is in fact beyond human grasp. Now, in the more orderly and man-made world of computation, we find that almost everything is uncomputable. There are infinitely many numbers, infinitely many problems, for which even the most powerful and sophisticated computer will be all but useless. Random numbers, in the sense discussed here, cannot be computed and can only come from some source outside the computer.

Turing completed his PhD dissertation at Princeton after the war. While accepting his own conclusion that some problems, like the Halting Problem, were just not tractable, he still wondered what a computer would look like if it could solve such problems, if it could somehow perform magic. In his dissertation, he augmented his TMs to include an "oracle," a magic device that could answer certain unanswerable questions. For example, a Halting Oracle was assumed to be able to discern, somehow, whether any given TM will eventually halt or whether it will run forever. Surprisingly, he found that such super-Turing machines are still limited in their abilities. A super-TM augmented with a halting oracle, for example, still suffers an inability to solve the Halting Problem as it applies to super-TMs. Modern computer scientists still use the idea of oracles to study the limits of computing. A random oracle, a function that always generates the same truly random number when given the same input, is widely used as an ideal cryptographic hash function. In Teaching Robots to Sneeze, we examined how to build what we might call a truly random oracle, which has no input parameters.

Last week, nearly sixty years after his suicide, Queen Elizabeth II granted Alan Turing a posthumous pardon for his supposed crimes. I've noticed that much of the reporting and discussion of this important event has focused on Turing's accomplishments and discoveries, which I find slightly ironic given that he spent so much of his short life contemplating the undiscoverable, the unknowable. Legendary superheroes often have a quiet place, a lair, in which to meditate and seek inspiration. Superman had his Fortress of Solitude, Batman his Batcave, and Alan Turing had the Divine Random.

Sunday, November 17, 2013

Teaching Robots to Sneeze

Modern digital computers are not designed to do unexpected things. When they do occur, these are usually called "failures."

But, as I discussed in The Divine Random in Physics, sometimes it's highly desirable, even necessary, to do something that is universally unpredictable. Or, in other words, to compute a truly random number. Even the phrase is possibly an oxymoron, because if you can compute something, or program a computer to compute it, can that ever be truly random? (I will discuss the relationship between computability and randomness in more detail later on.)

Now, if you've written much software, you may be wondering what the big deal is. After all, every operating system and pretty much every programming language has built-in functions called random() or rand() that generate random numbers for you using deep, complex mathematics that, thankfully, we programmers don't need to understand in order to use the functions. Unfortunately, despite their names, these functions do not produce random numbers in the sense that we have used the term here. In fact, the numbers computed by these so-called pseudo-random number generators (PRNGs) are completely, 100% predictable. They do have a sometimes useful property called statistical uniformity or statistical randomness, but they are in no way truly random. Think of the successive digits of the number pi, which are statistically random, but easily predictable by anyone who can divide the circumference of a circle by its diameter.

The extensive use of such PRNGs in commercial and government cryptography products has caused immeasurable harm, the extent of which is only just beginning to emerge. A few weeks ago, it was revealed that the default and recommended PRNG algorithm, whose formidable name is Dual Elliptic Curve Deterministic Random Bit Generator, used by many commercial companies in the US and abroad contains a backdoor vulnerability that was deliberately inserted by its inventor. That inventor, the US National Security Agency or NSA, now has the capability to predict pseudo-random sequences for any software that uses the tainted PRNG. If this doesn't seem all that important to you, consider this. According to the New York Times article,

The agency has circumvented or cracked much of the encryption, or digital scrambling, that guards global commerce and banking systems, protects sensitive data like trade secrets and medical records, and automatically secures the e-mails, Web searches, Internet chats and phone calls of Americans and others around the world...
Incidentally, the tainted algorithm was adopted so widely, despite its known origins in the NSA, because another US government agency, the National Institute of Standards and Technology, NIST, blessed it as a standard and recommended that it be used wherever possible. Government contractors and vendors supplying cryptography products to the federal government were then required to be certified, which certification included the requirement to use a certain PRNG algorithm.

PRNGs aren't all bad. There are many applications, e.g. Monte Carlo simulation and automated testing, where a repeatable stream of statistically random numbers is just what the doctor ordered. But there are other uses, cryptography being one, for which only the best, genuinely random, numbers will do. Another area where universally unpredictable, truly random numbers are highly desirable is for UUIDs - Universally Unique IDentifiers - used, for example, in massively scalable digital storage clusters that have global namespaces. Here, each stored object, and there may be billions or trillions of them, is assigned a UUID that is different from every other object's UUID. Not only that, it is different from every other UUID ever used or that ever will be used. Thus the term "universal." Such UUIDs become the true names for these objects, in the sense discussed in Satnam and other Dangers.

There are several different ways to generate UUIDs that are guaranteed to be unique. Most of these involve some sort of centralized naming authority. For example, the namespace for Internet URLs is ultimately managed by the Internet Assigned Numbers Authority, IANA. But deterministically generating UUIDs in this way is slow and requires a great deal of cooperation. And did I mention there's still some pseudo-government entity controlling the namespace?

An alternative approach, using large, truly random numbers for UUIDs, can't guarantee uniqueness, but it can make it very, very unlikely that two different objects will receive the same UUID (a "collision"). And if the calculable probability of a collision of truly random UUIDs is smaller than the probability of an undetected hardware or software failure in the deterministic algorithms (see Provably Probable is Better than Probably Provable), then the truly random approach has an important practical advantage over the "deterministic" ones. Namely, each process in a distributed network can, in theory, generate UUIDs independently of every other process; no cooperation with other processes or with a central naming authority is required.

Which brings us, finally, to the main topic of this chapter. How can we program a modern digital computer to do something that is universally unpredictable — truly random? How can we teach a robot to sneeze?

Consider the following programming problem. We have four server-class computers, no keyboards or mice attached, all plugged into the same power strip. The servers are all identical - same hardware, same memory, same peripherals. All are networked together through a single network switch. The power strip is switched on so that all four computers start at the same moment and they all boot the exact same operating system, say Linux, and then synchronize their clocks. They then auto-execute an application program whose sole task is to generate a 128-bit truly random number and share it with the other servers. Each server stores the four numbers generated in this round, then compares its number against all previously recorded ones. If its generated number is different from every other number ever generated by any of the servers, it reboots itself and the whole process starts over again. If there is ever a duplicate number generated, an alarm goes off, lights flash, sirens wail, and the V.P. of Engineering gets a frantic phone call in the middle of an otherwise relaxing and much needed vacation in Taos, New Mexico.

There are several different ways to solve this problem, and some are better than others. A naive (meaning terrible) approach would be to simply look around the system to find "random" or unpredictable values, then hash them together to form a candidate number. Simply scrambling the value of the real-time clock would generate a number that is statistically unique across reboots, but every server would likely generate the same value, since their clocks are synchronized. You could mix in other more or less random values, like the serial numbers of peripherals, if available, or the MAC addresses of the network interface cards. But these numbers are static across reboots, so the same machine would find the same values each time and from them generate the same candidate number. Also, these numbers are unique only if you trust the central naming authorities, usually manufacturers associations, that assigned them. Specifically in the case of MAC addresses for NICs, manufacturers are assigned blocks of numbers they then use in serial fashion for their network cards. Since our servers come from the same manufacturer and perhaps the same batch, there is a good chance the MACs are very close to one another. Contrary to common belief, MAC addresses can also be reused over time.

A better all-around approach is to use measured timings for hardware-related events, which relies in one way or another on detecting and measuring small discrepancies (a.k.a failures) in the hardware. The Linux operating system (and most commercial Unix variants) includes a random number generator like this in its kernel. This mechanism measures tiny variations in interrupt timings in various hardware drivers, including keyboard, mouse, hard disk, and network drivers, to accumulate an "entropy pool" of randomness. When an application wishes to use a random number, it simply reads from a special device called /dev/random and asks for some number of bytes of randomness. If there is sufficient entropy in the pool at the time of the request, the kernel returns the requested number of bytes of randomness and reduces the pool by the same number of bytes. If there is not sufficient entropy yet, the read from /dev/random will block until enough randomness accumulates.

The word entropy here is an allusion to Claude Shannon's idea of entropy in communication theory. The actual calculation of this supposed measure of randomness in the Linux random number generator is somewhat mysterious and ill-understood. Even though the source code for it is publicly available, there seems to be no good explanation for why it might work, in theory. Nevertheless, it is clear that the best sources for randomness are the human interfaces; the keyboard and mouse. Ironically, the best way for a robot to do something unexpected is to ask a human! For our sample network of servers, which is based on a real world cluster of storage servers I once helped design, there are no human interfaces from which to draw randomness. Analysis has shown that other sources, notably hard disk drivers, are very poor sources of randomness, generating something like one bit of entropy per minute of operation. It seems the Linux random number pool is exceedingly shallow, which is once again ironic given that we know randomness is not a kiddie pool in the middle of our otherwise deterministic universe, but rather a vast ocean of unpredictability surrounding us all.

There is a way to draw randomness directly from that ocean by using specialized devices called Quantum Random Number Generators, QRNGs. As the name suggests, these devices tap into quantum-level phenomena, including the mother of all weirdness, quantum entanglement, to pull randomness from between the grid lines of the rational universe. Unfortunately, these devices are generally not a part of standard, off-the-shelf, computers like the ones we have conjectured for our example network. Given the importance of this kind of functionality for effective cryptography, that's a little odd, don't you think? If I were a conspiracy theorist, I would suspect the NSA has had a part in this omission.

But the sad fact is, to solve our proposed puzzle using only commodity servers, we cannot rely on fancy QRNGs, at least not today. The question now boils down to this: How can we tap into the realm of quantum-level events using ordinary hardware?

One answer my colleagues and I came up with is to measure the tiny irregularities in the heartbeat of every digital computer, the CPU clock. This clock is derived from a very mundane source: a physically vibrating quartz crystal — basically an atomic tuning fork. Most modern CPUs use a crystal oscillator that vibrates in the neighborhood of 30-80 million times per second and that signal is then multiplied electrically to produce a clock frequency of three or four billion cycles per second, which is the rate at which the computer can execute its simplest instructions. Although modern crystal oscillators are highly stable in their frequencies, things like temperature, humidity and external vibrations do cause small variations to occur, and when the frequency is multiplied by electrical circuits, so is the noise. That's usually okay, since CPUs are remarkably tolerant of lower or higher clock frequencies — most will run just fine at half the clock rate and can generally be overclocked a bit to run faster. But if we could measure these small variations in the vibrational frequency of chemical bonds between individual atoms of a crystal lattice, that would be a good source of quantum randomness. The challenge there is, how can a computer measure variations in its own heartbeat without some sort of external reference?

Luckily, there is a second clock within most every modern computer, one that uses an independent crystal oscillator as its frequency source — the real-time clock or RTC. It is quite important for different computers within a network to agree on what time it is, or at least on how long a second is. To do this, most computers use a much lower-frequency, and less noisy, crystal oscillator operating down near the 100-thousand cycles per second range to measure the passage of time. Of course, this second crystal is subject to the same kind of noise as the CPU oscillator, though less so because of its lower frequency. But that's good for our purposes! We can simply count how many CPU clock cycles occur within each RTC cycle, subtract successive samples and contribute the least-significant few digits to the entropy pool. Conceptually, this sampling can be done by executing a simple instruction loop a few million times and asking the RTC how long it took to complete. Given the realities of things like hardware interrupts and process scheduling, the details are of course more complicated, but the solution remains practical and it works for almost any commercially-available digital computer.

Computers do exactly what we tell them to do. When they don't, we typically think of it as a failure. But unexpected does not always mean faulty. Not only is it useful for software developers to plan to fail when designing complex systems, it is sometimes necessary to actually elicit and embrace failures, to swim briefly in the ocean of randomness. Randomness, one could argue, is at the very core of humanity and human creativity, and it isn't possible to program a computer to do something completely unexpected. To teach a robot to sneeze, we must allow her to surprise us.

Wednesday, August 28, 2013

The Divine Random in Physics

A place where gently blowing semi-trees
Stand constant in the changing sands,
And pretty birds flying flawless paths
Sing palindromic melodies
A while back, I sat in a conference room with a group of smart software engineers discussing randomness. Our new distributed software product relied very heavily on being able to generate truly random numbers on generic digital computers, a surprisingly hard problem as it turns out. The question had arisen, how sure were we that the numbers we generated were really, truly random? And by the way, what the heck does that actually mean?

Star System Arp 194 (Courtesy NASA)
The Question Mark Galaxy

Someone posited that a truly random number might be one that is universally unpredictable, a definition I quite liked. But instead of saying I liked it, I flippantly suggested there might be no such thing, that given enough information and a big enough computer, we might possibly predict the outcome of any conceivable physical process. Two members of my team (defensively, I will point out they both hold PhDs and very large brains) immediately spoke up to point out how wrong I was and exclaimed in unison, "Bell's Theorem!" I was simultaneously mortified that I had made such a mistake and proud that not one but two members of my team knew enough about quantum metaphysics to correct me. In hindsight, it isn't surprising that it was the concept of randomness that took us so quickly from a mundane discussion of Bug 1356 to the esoteric nuances of life, the universe, and everything. Here's how we got there.

Randomness, as a natural phenomenon, was discovered in 1927 by a young German physicist named Werner Heisenberg. Prior to that time, the universe was completely predictable, or at least that's how we viewed it. Physicists from Newton to Einstein all believed if we just looked hard enough we would find the rulebook of the universe, the answer to everything. And even if that answer were more complicated than, say, 42, it would still be possible to predict the outcome of any physical process, as I had suggested in the engineering meeting. The most brilliant minds the human race had produced all rejected the idea that the universe might be a giant game of chance. Albert Einstein famously rejected it explicitly. Then came the Heisenberg Uncertainty Principle.

The Uncertainty Principle basically says, certain pairs of properties of physical objects — simple things like where it is and how fast it's going — cannot be simultaneously measured with perfect precision. The more carefully you measure the position of, say, an electron, the less certain you can be about its velocity at that same moment. If you are very, very, very careful measuring the position, then whatever number you observe for the velocity is essentially meaningless; it is random beyond a certain number of decimal places. Now this limit on how accurate one can be with these combined measurements is quite negligible for larger objects like bowling balls or BBs, but for small things like electrons and photons it makes a difference. The combined limit on our accuracy of measurement is determined by the reduced Plank constant which is about 35 decimal places of accuracy. Beyond that, physical properties are universally un-measurable. This can be understood by thinking about how one's measurements affect the object being measured. Measuring the position of an electron involves shining a light on it, and a more accurate measurement requires shorter bandwidth, higher energy photons. When the electron is impacted by the high-energy photon, its velocity is affected, thus introducing randomness.

And that is the way it was presented and talked about at first, as a limit on experimental accuracy. The Quantum Physics textbook I used in college, the 1974 edition of Quantum Physics, by Eisberg and Resnick, explained the Uncertainty Principle by saying, "our precision of measurement is inherently limited by the measurement process itself [...]." Albert Einstein, and many other prominent contemporaries of Heisenberg, believed there must still be an underlying set of "hidden variables" that control the universe and provide precise, deterministic answers to any question, even if we were forever limited in our ability to experimentally verify those answers due to the Uncertainty Principle.

Einstein, together with his colleagues Boris Podolsky and Nathan Rosen, even wrote a famous paper in which they, almost mockingly, proved that Quantum Mechanics must be wrong, or else the world as we know it would be a truly strange place. To do this, they assumed only two seemingly obvious things about the world. First, that objects have intrinsic properties like position and velocity, even when no one is measuring them. This they called "reality." And second, that measurements of reality in one place and time cannot instantaneously affect other, far away realities, a property they called "locality." Einstein, Podolsky and Rosen basically said, who would want to live in a world where reality and locality did not hold. In other words, they believed our friendly, orderly universe could not possibly be intrinsically random.

But they were wrong.

In 1964, Professor John Stewart Bell proved a result that some have called, "the most profound discovery of science." The unassuming title of his brilliant paper, On the Einstein Podolsky Rosen Paradox, referred back to the "paradox" outlined by Einstein and his pals. Bell proved that the universe is in fact fundamentally, inherently, inescapably random. More precisely, he showed that no deterministic theory based on hidden variables could possibly explain all the observed results of Quantum Mechanics. And if that means there is no such thing as reality or locality, then so be it. Either the principle of reality or the principle of locality (or both) does not apply in our universe! A strange place indeed.

And so my brilliant colleagues were right. Heisenberg's Uncertainty Principle is not just a limit on how accurately we can measure things. It's a limit on what we are allowed to know about the universe in which we live. There are physical quantities that are universally unpredictable. At the very foundation of our familiar physical world, lies the Divine Random.