Saturday, May 3, 2014

The Divine Random in Mathematics

I am the rock, you the stone,
Together we ballast reality.
Steady toward uncertain destiny,
Whither the sheets above us are blown.
Until fairly recently, 1931 to be exact, scientists and mathematicians believed we would someday find a theory that explains everything. By "theory" I mean a Formal Axiomatic System (FAS). A FAS is simply a relatively small set of given facts, axioms, from which a much larger set of theorems can be proven "by mechanical means." The axioms in a FAS are so basic that we believe them to be true without the need for further justification. A famous and straightforward example of a FAS is the small set of axioms provided by Giuseppe Peano at the end of the 19th century to formally define the natural numbers. The Peano axioms, in English, are as follows:
  1. The natural numbers contain the number 1.
  2. Every natural number has a successor, the next natural number.
  3. The number 1 is not the successor of any natural number.
  4. Two different natural numbers cannot have the same successor.
  5. If a set contains the successors of each of its members and the number 1, then it contains all the natural numbers.

From this small set of seemingly obvious "facts" and using the mechanical rules of logic, all of number theory can be derived. Addition, multiplication, prime numbers, prime decomposition and the Fundamental Theorem of Arithmetic, are all consequences of the five Peano Axioms listed above. Or, to look at it the other way round, the Peano Axioms constitute a very efficient compression of the entire body of number theory.

And that is how mathematics works. Every branch of it — geometry, algebra, analysis, etc. — can be boiled down to a similar compression of a very large body of theorems into a relatively small set of axioms. Unlike other sciences such as physics, biology, chemistry, or computation, mathematics is unconstrained by the laws of the universe in which we find ourselves living. It is purely an invention of man, limited only by our evolving intellect. We invented the game and control the rules. Surely, there are no fundamental reasons why we couldn't develop a mathematical Theory of Everything (ToE), a FAS from whose axioms one can derive by mechanical means all that is true.


Albert Einstein and Kurt Gödel
That is what pretty much every mathematician believed until 1931 when a 25 year-old Austrian post-doc named Kurt Gödel shocked everyone by proving that mathematics, that wholly man-made "queen of sciences," suffers from the same kind of fundamental limitation Heisenberg had discovered in physics five years earlier and Turing would find for computation five years later. Gödel's meta-theorem proved there is a shoreline beyond which vast shoals of truth exist that the lighthouse of pure mathematics cannot illuminate.

More specifically (and much less metaphorically), Gödel showed that for any FAS that is at least as rich as the Peano Axioms described above, and for which the axioms are not self-contradictory, there will be theorems expressible in the language of the FAS that are true, but un-provable from the axioms. That is, any consistent FAS is necessarily incomplete. Pure mathematics is not sufficient to account for a ToE. Truth, with a capital T, is incompressible. There will always be a theorem, perhaps infinitely many, whose shortest expression is the theorem itself. The only way to prove such theorems is to add them to the set of axioms!

Like other impossibility meta-theorems, e.g., Heisenberg's Uncertainty Principle in physics and Turing's Undecidability result in computing, Gödel's Incompleteness Theorem identifies a fundamental limit on our ability to know things, in this case using the tools of mathematics. As we have seen for physics and computing, beyond these boundaries lies randomness, and the same appears to be true for math as well. Our simple working definition of randomness, that it be universally unpredictable, can be made more precise in the realm of mathematics. Namely, something is mathematically random if there exists no FAS (with a finite set of axioms) from which it can be derived. A random number, in particular, is a number whose digits cannot be predicted within the framework of mathematics. Such a number is irreducible — the shortest possible way to express one is to write down all its (infinitely many) digits. If we can agree that a name for something is just a finite sequence of symbols that represent it, these numbers are necessarily nameless. Even numbers like π and e, which are transcendental, their digits never repeat, have namesπ and e are the names we have given to two specific infinite real numbers that can be computed in two very specific, and finite, ways. Mathematically random (aka irreducible) numbers are even weirder than transcendentals. So weird that it becomes reasonable to ask, do such numbers actually exist, and if so, can you show me one?

It's actually pretty easy to see that irreducible numbers, numbers that cannot be derived within any FAS, do indeed exist. However complex pure mathematics may become, it must eventually have a finite set of axioms from which all theorems, including those showing the existence of certain numbers, can be mechanically proved. A finite set of axioms can be used to derive an infinite set of theorems, just like a finite alphabet can generate an infinite set of names. But in both cases, the infinity of derivations is a countable infinity — there are only as many of them as there are integers. On the other hand, there is an uncountably infinite number of real numbers, which is a bigger infinity. So if you match up numbers to name/theorems, you will always have more of the former than the latter. In other words, there must be an (uncountably) infinite number of irreducible numbers. This is a so-called diagonalization argument that can be made formally rigorous, but is never very satisfying. What we'd really like to do is hold one of these odd beasts in our hands and examine it up close.

One obvious way to generate such a number is to appeal back to physics. We could simply flip a fair coin and write 0 for heads or 1 for tails, and continue doing that to generate as many bits as desired. Assuming we trust the fairness of our fair coin, the bits (digits) of such a number would be universally unpredictable — every bit is a completely separate "fact" and there is no shorter description of how to obtain them other than the number itself. But is there a way to do this while staying within the confines of pure math?


Around the time The Beatles brought bangs to America, in 1966, Dr. Gregory Chaitin found an answer to this question when he discovered what he called the Omega Numbers. Each Ω (there are infinitely many) is definable but completely irreducible. Like π and e, Ω is a transcendental number, but unlike it's more well-behaved cousins, Ω cannot be computed by any algorithm. The smallest possible expression of its digits are the digits themselves. Furthermore, those digits are normally distributed — every digit appears approximately the same number of times — no matter what base (decimal, octal, binary, hexadecimal, etc.) one might choose to represent it. Ω is mathematically random. To me, the most fascinating thing about Ω is that it sits at the nexus of Gödel's incompleteness and Turing's undecidability, telling us that the oceans of randomness beyond mathematics and computing are, in fact, one and the same — the Divine Random.

Chaitin defines Ω using the terminology and formalism of Turing (which, if you'll recall, is still within the realm of pure mathematics) by first choosing a suitable formal programming language and initializing an estimate of Ω to zero. For Chaitin's purposes, this means the language must be prefix-free so that programs are self-delimited. We then enumerate every possible string in the language. For each string we ask, a) is the string a syntactically correct program with input data appended, and b) does that program halt when supplied with the input data? If the answers to both a) and b) are Yes, we add the following quantity to our estimate: 1/2|p|, where |p| is the number of bits required to represent the program in the chosen language. If not, we just continue combing through all possible strings looking for legal, halting programs.

Does such a procedure produce a well-defined number? Absolutely! It is surely possible to lexically enumerate all possible strings, every string is either a legal program+data or not, and every legal program+data either halts or it doesn't. In fact, Chaitin showed that the resulting number, Ω, is a real number between 0 and 1. The catch, of course, is that it is impossible to compute such a number because, as Turing proved, it isn't possible to compute the answer to the halting problem in full generality. In a certain sense, each bit of Ω is chosen by flipping a mathematical coin — heads the program halts, tails it doesn't — whose outcome is universally unpredictable and uncomputable.

So what does it look like, you must be asking? Well, I can't say. As with other objects we have encountered floating in the sea of randomness, Ω can be seen only by its faint, hazy outline. We can begin to estimate Ω but we can never know more than a finite number of its digits. Moreover, it is impossible to compute how accurate our estimate is, how close or far away it is to the actual number. Ω will always remain fuzzy, its properties uncertain beyond some degree of magnification, much like the properties of physical objects governed by Heisenberg's Uncertainty Principle. Chaitin put it best in his excellent and accessible little monograph Meta Math! The Quest for Omega:

So fixing randomness in your mind is like trying to stare at something without blinking and without moving your eyes. If you do that, the scene starts to disappear in pieces from your visual field.

The harder you stare at randomness, the less you see of it!
Even if we can't know exactly what it looks like, the very existence of Ω proves, or at least strongly suggests, Gödel's incompleteness meta-theorem. There are indeed theorems, and numbers, that cannot be compressed, that cannot be derived by any effective procedure from a smaller set of axioms. And we know that mathematics is limited in this way because Turing showed us there are limits to what any effective procedure, aka computer program, aka Finite Axiomatic System, can accomplish. No matter how clever we humans are or will become, there are barriers to our knowledge that cannot be scaled. Tantalizingly, the answers are out there, floating around as random numbers. Ω, for example, solves the halting problem for every possible program. And it exists; it's out there, we just aren't allowed to see it. Similarly, there exists a random number whose binary bits answer every conceivable yes/no question that can be posed in a given language, including questions like, "Is Goldbach's conjecture true?" Science can't give us all those answers. Physics, mathematics, and computation theory are well-honed tools, but they cannot carve all the universe.

We are now approaching the point at which you might begin to understand why I call it, The Divine Random.

Monday, February 17, 2014

Creation

This chapter concerns the second time I created something smarter than me. The first such occasion, the birth of my only son, was not entirely my doing as my wife had a rather significant contribution too. But the second creation was all mine. It was a computer program that played the strategy board game Othello, also known as Reversi. (Note, the link provided leads to a different implementation of the game.) I wrote it as a semester project while doing research at the Artificial Intelligence Laboratory at the University of Texas at Austin.


I did this because back then, and still today, I was interested in finding answers to some deep questions about the nature of creativity and the creative process. Where do new ideas come from? What makes an idea clever? Imaginative? Inventive? How do humans create new things and can we program computers to do it too? And most generally, how does the Universe create order out of chaos?

It is clear that new ideas are not just mechanically derived from old ones, at least not the really good new ideas. There is usually a spark, a creative aha moment during which something completely new is dredged up from a place where few of us ever go. Ideas can be encoded as (possibly infinite) numbers, and we know (see The Divine Random in Computing) that not all numbers can be computed by any effective procedure. That means there are infinitely many new ideas, most of them in fact, that cannot possibly be derived from what we already know. One must assume that at least some of these infinitely many ideas are good ones. A good truly random number generator will, eventually, produce all possible numbers, including quite a few that encode good ideas. So there we have an answer to our questions: just wait around for the next good idea to come around on the random number generator. As we'll see in a later chapter, it can be shown that a random number exists that encodes the correct answers to all possible yes/no questions expressible in a given language. A single number that contains the answers to all our questions certainly seems to qualify as a good idea.

Of course, that's not very practical. The probability of our random number generator producing the answer to everything, or even a single good idea, in our lifetimes is minuscule, to say the least. And besides, we probably wouldn't recognize a good idea if it presented itself at our front door and brought nachos. It would be like working on a jigsaw puzzle, picking up a random piece and declaring, "Eureka! I've discovered something wonderful." A jigsaw piece or an idea isn't a great find unless it fits within the existing framework we have already constructed and understand. The greatest ideas are not just the ones that are unexpected and fresh, but also the ones that are most tightly constrained, by the rules of mathematics, art, language, etc. But randomness must play a role in the creative process. The close proximity between madness and creativity, genius and weirdness, eccentricity and insanity, has long been known and commented upon by philosophers and humorists through the ages. Genius, it seems, is the exact right admixture of crazy and constraint.

With all those general ideas about creativity, I set out to build a computer program that could surprise me by having a good idea, one that I myself had not built into its code. I chose to write a program that played Othello for several reasons — the rules are simple, the game is easy to play but non-trivial, and perhaps most importantly, I wasn't very good at it. Most of the Othello program consisted of my own implementations of several well-known game playing algorithms that had been and continue to be used for computer strategy games. At its core was the so-called mini-max tree search algorithm with alpha-beta pruning. The linked article has a pretty good explanation of what that is. Basically, it's a formalized way to imagine what will happen in the future. If I do this, my opponent will probably do that, after which I can do this other thing, and so on. Because a computer can do those kinds of things really quickly, it can enumerate through all possible legal moves it has to choose from at any point in the game, and through all the legal moves its opponent would have, through many levels of what-ifs. But the decision tree for Othello, like the ones for Chess and Go, is too big for any computer to completely traverse all the way down to the bottom (at which point it can simply ask which of my current moves leads to a win many moves down the road). So the computer stops searching the decision tree at some arbitrarily-chosen depth and simply looks at the resulting board positions asking itself which ones seem to be the most advantageous for it. This last bit is heuristic, of course. It's a guess based on who has how many pieces remaining and how they are distributed about the board. Game programmers call this a static evaluation function.

There are many, many different types of static evaluation functions for the game of Othello, from simple linear combinations of cell contents (0=empty, 1=mine, -1=opponent's), to more sophisticated heuristics based on pattern-matching. I chose a paradigm that looked for a certain fixed set of patterns on the board and weighted those patterns, when they occurred, to produce a simple number that could be compared against other positions. The choice of static evaluation function thus boiled down to a vector of small integers. Choosing the proper weights for the evaluation vector to make the program play well required deep insights into the game's winning strategies; insights I did not have. And despite its look-ahead capability, the quality of the game's play against me was very sensitive to the choice of static evaluation vector — for some vectors I could beat the program nearly every time, while others were more difficult to win against. Using trial and error, I was not able to find a vector that would make the game play consistently better than me, a poor player.

And then the light bulb came on. Why not have the computer play against itself, using different static evaluation vectors (SEVs), and search for the best one? To do this, I coded the following search algorithm. First, I randomly generated several thousand SEVs to form an initial population of alternatives. I let the computer randomly choose two of those SEVs and play a game, one strategy against the other. Each SEV also had a merit score attached to it, and when an individual SEV won a game against an opponent, the program would increment the winner's merit score and decrement the loser's. In this way, the computer would eventually filter through the initial population and find the best strategy found there.

But what if there was a really great idea, a winning strategy, that did not happen to be one of the few thousand initial, random SEVs? I needed to somehow interject new blood, fresh DNA into the gene pool in order to do a more thorough search of the whole space of all SEVs. To do that, I wrote a second process, in addition to the competition process described above, to periodically generate new individuals. But I didn't just pick a random puzzle piece and cry Eureka! Instead I had the computer choose two different individuals, randomly but with a distribution that favored SEVs with higher merit scores, and combine the vectors in a way that borrows from biology. To combine two SEVs, the computer chose a random position along the vector and split both parent vectors at that point. A new child vector was created by choosing the first segment of one of the parents and the second segment of the other parent and then, with small probability, adding or subtracting one from a randomly selected value on the vector. In biology, this is called chromosomal cross-over with mutation. Once a new individual SEV was created in this way, the individual with the lowest merit score in the entire population was "killed off" and replaced with the new offspring, who was then available for further competition and breeding.


Once the competition and the breeding processes were chugging along, new strategies were being created and the computer played one game after another with itself. I allowed it to run all night, and all day the next day, and all that week, searching for that good idea, that winning strategy that I could not have taught it even if I'd wanted to. Every once in a while, I would interrupt the search to play a game against the current best strategy evolved so far. At first, the randomly-chosen strategies were pretty terrible. After only a few hours, I found I could still beat the computer regularly and with little effort. After a day or two, it became noticeably more difficult to win, and I found I had to concentrate more carefully to avoid the strategic tricks pulled by the computer. After four days, the computer began to win regularly, and after about a week, I found I could no longer win even a single game against the computer program that I myself had created. Now let me assure you, that is a spooky feeling, to be outsmarted by one's own creation. It was definitely a Frankenstein moment for me.

The type of search algorithm I used in the Othello game is known as a genetic algorithm (GA) because it mimics natural selection. GAs typically include most of the elements I've described above: a competition process, a breeding process that uses genetic operators like cross-over and mutation, and a fitness function or merit score. Randomness plays a vital role in the successful operation of any GA, but it isn't just a random search. Applying the fitness function to select competitors and breeding partners ensures the algorithm as a whole behaves as a kind of a algorithmic ratchet, always making progress toward a more fit population. Random variation together with deterministic filtering, survival of the fittest, is the hallmark of GA. Another common characteristic is the distinction between a genome and a phenome. In the Othello example, the genome was the Static Evaluation Vector, which resembles DNA in function. The phenome, the organism itself, is the rest of the game playing machinery — the mini-max tree search with optimizations and the codified rules that prevent the game making illegal moves.

And so you see, evolution exists. It is, as they say nowadays, a thing. I've touched it, implemented it, and watched it create things that surprised me. Evolution is not simply a random number generator, but rather an algorithm that systematically trolls the vast ocean of randomness to make steady progress in the direction of survivability. It is not just a way to create useful new ideas and individuals, it may be the only such algorithm we know of. And so it worries and frustrates me to see results like the recent Pew poll on the Public's View of Human Evolution that suggests more and more people do not believe that biological evolution is a thing.

To be sure, random mutation without the rational, mechanical ratcheting of natural selection would make no progress at all. The probability of producing something useful, or more useful, from such a coin flip process is unimaginably small. But randomness is the seed of creation, the origin of species. It is the essence of humanity and certainty its antithesis. When a man is entirely reduced to numbers, he ceases to be alive.

Tuesday, December 31, 2013

The Divine Random in Computing

We'll find all its secrets and make them our own;
A yarn to weave and be woven.
We'll choose among the least likely chosen
And build a castle stone by stone.

Alan Turing - Superhero
(Courtesy technewspedia.com)
Hollywood seems enamored these days with the superhero story. Movies about Thor, Spiderman, and Superman, all have basically the same plot. In his youth, our hero discovers a unique ability, a superpower, which he then uses to save the world from an evil arch-villain in violently dramatic fashion. Later, the very people whose lives he spared begin to fear his superpower and persecute him for his eccentricities. The superhero becomes despondent and, eventually, turns to the only being who has the means to end his suffering — himself.

Of course, there are no superheroes in real life, but there was a man whose biography pretty much mirrors the superhero story — the British mathematician Alan Turing. Turing discovered his superpower, a genius for mathematical logic and computing, at the age of twenty-four when, as a graduate student, he published a paper entitled, "On Computable Numbers, with an Application to the Entscheidungsproblem.” I'll have more to say about the ideas in this paper later, but suffice to say that Turing's result is one of the most profound triumphs of rationality. Turing took his superpower into World War II and used it to battle the Nazis. He designed a computer called the "Bombe" that he and his team used to crack the German Enigma Code, thus freeing Allied shipping and supply lines from the dreaded U-Boats and saving England from the arch-villain Adolph Hitler.


Replica of Turing's 'Bombe' Code Breaker
(Courtesy Wikimedia)
After the war, Turing returned to his studies, published his dissertation, and initially received well-deserved gratitude at home and abroad. Ultimately, as the superhero plot dictates, his fellow citizens became suspicious of Turing's research and perhaps a bit jealous of his intellect. More troubling for Turing, the Brits, whom he had helped rescue from Fascism, were creeped out by his sexual preferences — Turing was gay. The British courts convicted him of gross indecency and sentenced him to a regimen of hormone injections designed to completely suppress his sex drive, not knowing what side-effects that might cause. Turing, a fan of fairy tales like Sleeping Beauty, became depressed, and one morning in 1954 (coincidentally the year of my birth) his housekeeper found him dead with a partially eaten, cyanide-laced apple on his bedside table.

Turing's remarkable 1936 paper defined both the beginning and the end of a new branch of mathematics — computation theory. In the first third of the paper, he formally defined a simple but robust computer now called a Turing Machine (TM). In the second third of the paper he proved that his little machine is universal, that it is capable of performing any computation any other computer can perform. In fact, any machine that can reasonably be called a computer, past or future, laptop or supercomputer, is no more powerful than a simple TM. Modern computers may be and generally are faster in performing some computations, but however elaborate they may be, a TM can accomplish the same thing, eventually. Turing's goal in this was to find an intuitive but formal definition of the phrase "by mechanical means," a phrase that is used quite a lot in mathematics and science.

If Turing had stopped and published the first two thirds of his paper, it would still have been an incredible achievement. It provides us with a universally applicable, easy to understand definition of what we mean by the word "algorithm", an orderly, step-by-step, procedure for doing something like verifying the proof of a mathematical theorem. Turing conjectured that we mean a computer program, specifically a Turing Machine, and today most mathematicians agree with his conjecture. Any effective method for doing something, any method that does not require ingenuity or creativity, exists if and only if there is a TM whose steps encode that method. That's an amazingly general concept which is widely known as the Church-Turing Thesis (Church was Turing's teacher, among many other accomplishments).

But Turing didn't stop after just defining what computation means in a formal sense. The final section of his paper went on to prove there are problems, infinitely many of them, that are not soluble by any TM, and therefore no deterministic procedure exists for finding all the answers. In particular, he proved there are infinitely many real numbers whose digits cannot be computed by any TM, or by any mechanical means whatsoever. I wish I could give you a concrete example of such an uncomputable number, but I cannot, and no one can. Turing proved they are out there, but to provide a finite recipe for producing one of them is fundamentally impossible! It turns out to be quite difficult to identify individual objects floating in the sea of randomness, and when we do find them they're often fuzzy and difficult to crisply discern.

Though we are unable to provide a recipe to produce all the digits of any given uncomputable number, Turing did define a specific problem that he proved could not be solved by any algorithm — the Halting Problem. Is it possible to write a compiler that, given the source code and input data of any other program, will tell us whether the program will ever halt on that input data? The answer is, no such program can possibly exist because it would lead to a logical paradox akin to the "everything I say is a lie" conundrum. Much has already been written about the halting problem, so I won't dwell on it here. It will, however, come up again when we talk about randomness in mathematics.

Our working, informal definition of a random number is any number that is (i.e. whose digits are) universally unpredictable. We've seen that nature provides copious examples of quantities whose measures at the nano-scale are random in this sense. Turing's result shows there are numbers that cannot be computed by any finite algorithm and the Church-Turing Thesis allows us to extend that to mean such numbers cannot be produced by any effective procedure whatsoever. In other words, the shortest, most compact way to describe such a number is to simply list all its digits!

Now you may think these unnameable, unspeakable numbers are rare and difficult to find, and you'd be half right about that. But they are most assuredly not rare. In fact, if you threw a very, very sharp dart and hit the real number line, the probability of hitting such an unnameable number is 1. There is a countably infinite number of nameable numbers, but an uncountably infinite number of unnameable ones. Gregory Chaitin, an American mathematician and computing theorist whom we will talk more about in the chapter on randomness in mathematics, has a name for these real numbers whose shortest possible exact descriptions are the numbers themselves. He calls them algorithmically random, but admits the concept may actually be the best definition of randomness in all its generality.

In physics, we began with the ambition to discover a small set of general principles with which all the universe could be explained, only to find that most of the universe is in fact beyond human grasp. Now, in the more orderly and man-made world of computation, we find that almost everything is uncomputable. There are infinitely many numbers, infinitely many problems, for which even the most powerful and sophisticated computer will be all but useless. Random numbers, in the sense discussed here, cannot be computed and can only come from some source outside the computer.

Turing completed his PhD dissertation at Princeton after the war. While accepting his own conclusion that some problems, like the Halting Problem, were just not tractable, he still wondered what a computer would look like if it could solve such problems, if it could somehow perform magic. In his dissertation, he augmented his TMs to include an "oracle," a magic device that could answer certain unanswerable questions. For example, a Halting Oracle was assumed to be able to discern, somehow, whether any given TM will eventually halt or whether it will run forever. Surprisingly, he found that such super-Turing machines are still limited in their abilities. A super-TM augmented with a halting oracle, for example, still suffers an inability to solve the Halting Problem as it applies to super-TMs. Modern computer scientists still use the idea of oracles to study the limits of computing. A random oracle, a function that always generates the same truly random number when given the same input, is widely used as an ideal cryptographic hash function. In Teaching Robots to Sneeze, we examined how to build what we might call a truly random oracle, which has no input parameters.

Last week, nearly sixty years after his suicide, Queen Elizabeth II granted Alan Turing a posthumous pardon for his supposed crimes. I've noticed that much of the reporting and discussion of this important event has focused on Turing's accomplishments and discoveries, which I find slightly ironic given that he spent so much of his short life contemplating the undiscoverable, the unknowable. Legendary superheroes often have a quiet place, a lair, in which to meditate and seek inspiration. Superman had his Fortress of Solitude, Batman his Batcave, and Alan Turing had the Divine Random.

Sunday, November 17, 2013

Teaching Robots to Sneeze

Modern digital computers are not designed to do unexpected things. When they do occur, these are usually called "failures."

But, as I discussed in The Divine Random in Physics, sometimes it's highly desirable, even necessary, to do something that is universally unpredictable. Or, in other words, to compute a truly random number. Even the phrase is possibly an oxymoron, because if you can compute something, or program a computer to compute it, can that ever be truly random? (I will discuss the relationship between computability and randomness in more detail later on.)

Now, if you've written much software, you may be wondering what the big deal is. After all, every operating system and pretty much every programming language has built-in functions called random() or rand() that generate random numbers for you using deep, complex mathematics that, thankfully, we programmers don't need to understand in order to use the functions. Unfortunately, despite their names, these functions do not produce random numbers in the sense that we have used the term here. In fact, the numbers computed by these so-called pseudo-random number generators (PRNGs) are completely, 100% predictable. They do have a sometimes useful property called statistical uniformity or statistical randomness, but they are in no way truly random. Think of the successive digits of the number pi, which are statistically random, but easily predictable by anyone who can divide the circumference of a circle by its diameter.

The extensive use of such PRNGs in commercial and government cryptography products has caused immeasurable harm, the extent of which is only just beginning to emerge. A few weeks ago, it was revealed that the default and recommended PRNG algorithm, whose formidable name is Dual Elliptic Curve Deterministic Random Bit Generator, used by many commercial companies in the US and abroad contains a backdoor vulnerability that was deliberately inserted by its inventor. That inventor, the US National Security Agency or NSA, now has the capability to predict pseudo-random sequences for any software that uses the tainted PRNG. If this doesn't seem all that important to you, consider this. According to the New York Times article,

The agency has circumvented or cracked much of the encryption, or digital scrambling, that guards global commerce and banking systems, protects sensitive data like trade secrets and medical records, and automatically secures the e-mails, Web searches, Internet chats and phone calls of Americans and others around the world...
Incidentally, the tainted algorithm was adopted so widely, despite its known origins in the NSA, because another US government agency, the National Institute of Standards and Technology, NIST, blessed it as a standard and recommended that it be used wherever possible. Government contractors and vendors supplying cryptography products to the federal government were then required to be certified, which certification included the requirement to use a certain PRNG algorithm.

PRNGs aren't all bad. There are many applications, e.g. Monte Carlo simulation and automated testing, where a repeatable stream of statistically random numbers is just what the doctor ordered. But there are other uses, cryptography being one, for which only the best, genuinely random, numbers will do. Another area where universally unpredictable, truly random numbers are highly desirable is for UUIDs - Universally Unique IDentifiers - used, for example, in massively scalable digital storage clusters that have global namespaces. Here, each stored object, and there may be billions or trillions of them, is assigned a UUID that is different from every other object's UUID. Not only that, it is different from every other UUID ever used or that ever will be used. Thus the term "universal." Such UUIDs become the true names for these objects, in the sense discussed in Satnam and other Dangers.

There are several different ways to generate UUIDs that are guaranteed to be unique. Most of these involve some sort of centralized naming authority. For example, the namespace for Internet URLs is ultimately managed by the Internet Assigned Numbers Authority, IANA. But deterministically generating UUIDs in this way is slow and requires a great deal of cooperation. And did I mention there's still some pseudo-government entity controlling the namespace?

An alternative approach, using large, truly random numbers for UUIDs, can't guarantee uniqueness, but it can make it very, very unlikely that two different objects will receive the same UUID (a "collision"). And if the calculable probability of a collision of truly random UUIDs is smaller than the probability of an undetected hardware or software failure in the deterministic algorithms (see Provably Probable is Better than Probably Provable), then the truly random approach has an important practical advantage over the "deterministic" ones. Namely, each process in a distributed network can, in theory, generate UUIDs independently of every other process; no cooperation with other processes or with a central naming authority is required.

Which brings us, finally, to the main topic of this chapter. How can we program a modern digital computer to do something that is universally unpredictable — truly random? How can we teach a robot to sneeze?

Consider the following programming problem. We have four server-class computers, no keyboards or mice attached, all plugged into the same power strip. The servers are all identical - same hardware, same memory, same peripherals. All are networked together through a single network switch. The power strip is switched on so that all four computers start at the same moment and they all boot the exact same operating system, say Linux, and then synchronize their clocks. They then auto-execute an application program whose sole task is to generate a 128-bit truly random number and share it with the other servers. Each server stores the four numbers generated in this round, then compares its number against all previously recorded ones. If its generated number is different from every other number ever generated by any of the servers, it reboots itself and the whole process starts over again. If there is ever a duplicate number generated, an alarm goes off, lights flash, sirens wail, and the V.P. of Engineering gets a frantic phone call in the middle of an otherwise relaxing and much needed vacation in Taos, New Mexico.

There are several different ways to solve this problem, and some are better than others. A naive (meaning terrible) approach would be to simply look around the system to find "random" or unpredictable values, then hash them together to form a candidate number. Simply scrambling the value of the real-time clock would generate a number that is statistically unique across reboots, but every server would likely generate the same value, since their clocks are synchronized. You could mix in other more or less random values, like the serial numbers of peripherals, if available, or the MAC addresses of the network interface cards. But these numbers are static across reboots, so the same machine would find the same values each time and from them generate the same candidate number. Also, these numbers are unique only if you trust the central naming authorities, usually manufacturers associations, that assigned them. Specifically in the case of MAC addresses for NICs, manufacturers are assigned blocks of numbers they then use in serial fashion for their network cards. Since our servers come from the same manufacturer and perhaps the same batch, there is a good chance the MACs are very close to one another. Contrary to common belief, MAC addresses can also be reused over time.

A better all-around approach is to use measured timings for hardware-related events, which relies in one way or another on detecting and measuring small discrepancies (a.k.a failures) in the hardware. The Linux operating system (and most commercial Unix variants) includes a random number generator like this in its kernel. This mechanism measures tiny variations in interrupt timings in various hardware drivers, including keyboard, mouse, hard disk, and network drivers, to accumulate an "entropy pool" of randomness. When an application wishes to use a random number, it simply reads from a special device called /dev/random and asks for some number of bytes of randomness. If there is sufficient entropy in the pool at the time of the request, the kernel returns the requested number of bytes of randomness and reduces the pool by the same number of bytes. If there is not sufficient entropy yet, the read from /dev/random will block until enough randomness accumulates.

The word entropy here is an allusion to Claude Shannon's idea of entropy in communication theory. The actual calculation of this supposed measure of randomness in the Linux random number generator is somewhat mysterious and ill-understood. Even though the source code for it is publicly available, there seems to be no good explanation for why it might work, in theory. Nevertheless, it is clear that the best sources for randomness are the human interfaces; the keyboard and mouse. Ironically, the best way for a robot to do something unexpected is to ask a human! For our sample network of servers, which is based on a real world cluster of storage servers I once helped design, there are no human interfaces from which to draw randomness. Analysis has shown that other sources, notably hard disk drivers, are very poor sources of randomness, generating something like one bit of entropy per minute of operation. It seems the Linux random number pool is exceedingly shallow, which is once again ironic given that we know randomness is not a kiddie pool in the middle of our otherwise deterministic universe, but rather a vast ocean of unpredictability surrounding us all.

There is a way to draw randomness directly from that ocean by using specialized devices called Quantum Random Number Generators, QRNGs. As the name suggests, these devices tap into quantum-level phenomena, including the mother of all weirdness, quantum entanglement, to pull randomness from between the grid lines of the rational universe. Unfortunately, these devices are generally not a part of standard, off-the-shelf, computers like the ones we have conjectured for our example network. Given the importance of this kind of functionality for effective cryptography, that's a little odd, don't you think? If I were a conspiracy theorist, I would suspect the NSA has had a part in this omission.

But the sad fact is, to solve our proposed puzzle using only commodity servers, we cannot rely on fancy QRNGs, at least not today. The question now boils down to this: How can we tap into the realm of quantum-level events using ordinary hardware?


One answer my colleagues and I came up with is to measure the tiny irregularities in the heartbeat of every digital computer, the CPU clock. This clock is derived from a very mundane source: a physically vibrating quartz crystal — basically an atomic tuning fork. Most modern CPUs use a crystal oscillator that vibrates in the neighborhood of 30-80 million times per second and that signal is then multiplied electrically to produce a clock frequency of three or four billion cycles per second, which is the rate at which the computer can execute its simplest instructions. Although modern crystal oscillators are highly stable in their frequencies, things like temperature, humidity and external vibrations do cause small variations to occur, and when the frequency is multiplied by electrical circuits, so is the noise. That's usually okay, since CPUs are remarkably tolerant of lower or higher clock frequencies — most will run just fine at half the clock rate and can generally be overclocked a bit to run faster. But if we could measure these small variations in the vibrational frequency of chemical bonds between individual atoms of a crystal lattice, that would be a good source of quantum randomness. The challenge there is, how can a computer measure variations in its own heartbeat without some sort of external reference?

Luckily, there is a second clock within most every modern computer, one that uses an independent crystal oscillator as its frequency source — the real-time clock or RTC. It is quite important for different computers within a network to agree on what time it is, or at least on how long a second is. To do this, most computers use a much lower-frequency, and less noisy, crystal oscillator operating down near the 100-thousand cycles per second range to measure the passage of time. Of course, this second crystal is subject to the same kind of noise as the CPU oscillator, though less so because of its lower frequency. But that's good for our purposes! We can simply count how many CPU clock cycles occur within each RTC cycle, subtract successive samples and contribute the least-significant few digits to the entropy pool. Conceptually, this sampling can be done by executing a simple instruction loop a few million times and asking the RTC how long it took to complete. Given the realities of things like hardware interrupts and process scheduling, the details are of course more complicated, but the solution remains practical and it works for almost any commercially-available digital computer.

Computers do exactly what we tell them to do. When they don't, we typically think of it as a failure. But unexpected does not always mean faulty. Not only is it useful for software developers to plan to fail when designing complex systems, it is sometimes necessary to actually elicit and embrace failures, to swim briefly in the ocean of randomness. Randomness, one could argue, is at the very core of humanity and human creativity, and it isn't possible to program a computer to do something completely unexpected. To teach a robot to sneeze, we must allow her to surprise us.

Wednesday, August 28, 2013

The Divine Random in Physics

A place where gently blowing semi-trees
Stand constant in the changing sands,
And pretty birds flying flawless paths
Sing palindromic melodies
A while back, I sat in a conference room with a group of smart software engineers discussing randomness. Our new distributed software product relied very heavily on being able to generate truly random numbers on generic digital computers, a surprisingly hard problem as it turns out. The question had arisen, how sure were we that the numbers we generated were really, truly random? And by the way, what the heck does that actually mean?

Star System Arp 194 (Courtesy NASA)
The Question Mark Galaxy

Someone posited that a truly random number might be one that is universally unpredictable, a definition I quite liked. But instead of saying I liked it, I flippantly suggested there might be no such thing, that given enough information and a big enough computer, we might possibly predict the outcome of any conceivable physical process. Two members of my team (defensively, I will point out they both hold PhDs and very large brains) immediately spoke up to point out how wrong I was and exclaimed in unison, "Bell's Theorem!" I was simultaneously mortified that I had made such a mistake and proud that not one but two members of my team knew enough about quantum metaphysics to correct me. In hindsight, it isn't surprising that it was the concept of randomness that took us so quickly from a mundane discussion of Bug 1356 to the esoteric nuances of life, the universe, and everything. Here's how we got there.

Randomness, as a natural phenomenon, was discovered in 1927 by a young German physicist named Werner Heisenberg. Prior to that time, the universe was completely predictable, or at least that's how we viewed it. Physicists from Newton to Einstein all believed if we just looked hard enough we would find the rulebook of the universe, the answer to everything. And even if that answer were more complicated than, say, 42, it would still be possible to predict the outcome of any physical process, as I had suggested in the engineering meeting. The most brilliant minds the human race had produced all rejected the idea that the universe might be a giant game of chance. Albert Einstein famously rejected it explicitly. Then came the Heisenberg Uncertainty Principle.

The Uncertainty Principle basically says, certain pairs of properties of physical objects — simple things like where it is and how fast it's going — cannot be simultaneously measured with perfect precision. The more carefully you measure the position of, say, an electron, the less certain you can be about its velocity at that same moment. If you are very, very, very careful measuring the position, then whatever number you observe for the velocity is essentially meaningless; it is random beyond a certain number of decimal places. Now this limit on how accurate one can be with these combined measurements is quite negligible for larger objects like bowling balls or BBs, but for small things like electrons and photons it makes a difference. The combined limit on our accuracy of measurement is determined by the reduced Plank constant which is about 35 decimal places of accuracy. Beyond that, physical properties are universally un-measurable. This can be understood by thinking about how one's measurements affect the object being measured. Measuring the position of an electron involves shining a light on it, and a more accurate measurement requires shorter bandwidth, higher energy photons. When the electron is impacted by the high-energy photon, its velocity is affected, thus introducing randomness.

And that is the way it was presented and talked about at first, as a limit on experimental accuracy. The Quantum Physics textbook I used in college, the 1974 edition of Quantum Physics, by Eisberg and Resnick, explained the Uncertainty Principle by saying, "our precision of measurement is inherently limited by the measurement process itself [...]." Albert Einstein, and many other prominent contemporaries of Heisenberg, believed there must still be an underlying set of "hidden variables" that control the universe and provide precise, deterministic answers to any question, even if we were forever limited in our ability to experimentally verify those answers due to the Uncertainty Principle.

Einstein, together with his colleagues Boris Podolsky and Nathan Rosen, even wrote a famous paper in which they, almost mockingly, proved that Quantum Mechanics must be wrong, or else the world as we know it would be a truly strange place. To do this, they assumed only two seemingly obvious things about the world. First, that objects have intrinsic properties like position and velocity, even when no one is measuring them. This they called "reality." And second, that measurements of reality in one place and time cannot instantaneously affect other, far away realities, a property they called "locality." Einstein, Podolsky and Rosen basically said, who would want to live in a world where reality and locality did not hold. In other words, they believed our friendly, orderly universe could not possibly be intrinsically random.

But they were wrong.

In 1964, Professor John Stewart Bell proved a result that some have called, "the most profound discovery of science." The unassuming title of his brilliant paper, On the Einstein Podolsky Rosen Paradox, referred back to the "paradox" outlined by Einstein and his pals. Bell proved that the universe is in fact fundamentally, inherently, inescapably random. More precisely, he showed that no deterministic theory based on hidden variables could possibly explain all the observed results of Quantum Mechanics. And if that means there is no such thing as reality or locality, then so be it. Either the principle of reality or the principle of locality (or both) does not apply in our universe! A strange place indeed.

And so my brilliant colleagues were right. Heisenberg's Uncertainty Principle is not just a limit on how accurately we can measure things. It's a limit on what we are allowed to know about the universe in which we live. There are physical quantities that are universally unpredictable. At the very foundation of our familiar physical world, lies the Divine Random.

Thursday, November 1, 2012

Truth Dealers

There will always be war and politics. Just as the ready availability of weapons escalates and prolongs wars, the ready availability of information escalates and prolongs political battles.

As the 2012 U.S. election season draws to a close, there have been many possible reasons proffered for why this one seems to have been particularly, historically divisive. The battle lines between Democrats and Republicans have grown deeper, wider, and more extensive than for any other election — at least any I can remember. This, despite the facts that, 1) the two parties have nearly identical ideologies, and, 2) most voters do not fully support the published platforms of either party.

One of the reasons I've heard suggested to explain this latest battle in the ideological war that began, really, back in the photo-finish election of 2000, is that we have somehow lost our respect for the truth. He who tells the biggest whopper gets the most votes. In The Truth and Nothing But...A Good Story, I wrote about how partisan lies often spread like wildfire through the Internet, propagated by people who are either too lazy to check the validity of the facts, or too cynical to care. In the latter days of the campaigns, this overt dishonesty isn't limited to the Internet, with TV and radio ads now claiming things that are blatantly, provably, unequivocally false.

What's more maddening, nobody ever seems to want to retract these lies once they're caught. The phenomena of "doubling down" and "tripling down, basically repeating the original lie louder and to a broader audience, has become epidemic. Why? Because it works! And here's how...

Think of political operatives as carnival barkers. Their job is to draw attention to a certain set of facts that favor their candidate. But the facts themselves may be dry and boring without some dramatic context. "See the man with no stomach!" they scream. The actual fact is, there's a skinny guy who can suck in his gut to the point where his spine is visible from the front, still worth the price of admission perhaps (if you're interested in such oddities), just not as catchy and seductive as the shouted hook would have you believe. Some customers, after viewing the side-show, may complain about the lie, pointing out the original claim was misleading and blatantly, provably, unequivocally false. "Pants on fire!" they exclaim. But who cares? They have witnessed something they probably wouldn't have seen otherwise, and the carnival has their $1 so everybody's happy. Meanwhile, the barker continues to shout even louder, "See the man with no stomach!"

Even carnival barkers have legal restrictions that prevent them going too far in making false claims and false implications. Under certain circumstances, their utterances can be considered to be advertising, which is regulated by the Federal Trade Commission and by similar state agencies. Under penalty of law, advertisers are not allowed to make false claims, or even misleading ones, about their products or their competitor's. The FTC defines false advertising to include the following:

means of advertisement [...] which is misleading in a material respect; and in determining whether an advertisement is misleading, there shall be taken into account (among other things) not only representations made or suggested [...] but also the extent to which the advertisement fails to reveal facts material in the light of such representations.
In this campaign cycle, there have been many, many political ads, including the one linked above, that clearly would have violated FTC laws if the product being sold had been, say, soap instead of a candidate for political office. Why do we hold carnival barkers and purveyors of dish soaps to a higher standard of honesty than our presidential candidates? The answer is, the First Amendment to the Constitution of the U.S. The courts have held that political ads cannot be considered to be advertising. They are political speech, which is protected under the First Amendment. As a result, it is legal for anyone to tell any political lie, however outrageous, as loudly and often as they wish.

And, honestly, how else could it work? We certainly don't want the government telling us which political claims we should believe and which ones we should discount. The problem is really a practical one stemming, ironically, from the success the FTC has had in their enforcement of truth in advertising. I've heard people say things like, "There must be some truth in that political ad, otherwise they wouldn't let it be shown on TV." This "nanny expectation" is completely unfounded when it comes to the strange world of political advertising and no comfort whatsoever can be drawn from it. (Interestingly, the truth-in-advertising expectation does not seem to extend to the Internet, even though the same FTC laws apply there as on traditional broadcast media.)

Legally, practically, and in every other sense, it is the responsibility of each individual voter in the U.S. to research campaign ads they may find compelling to discover whether they are indeed truthful; a daunting task. The only reasonable alternative is to rely on a trusted source to do the fact checking for you. There are now legions of so-called fact checkers, and nearly every claim made by every major political candidate is combed over, investigated, parsed and re-parsed to try to determine whether it is true, false, or somewhere in between. Campaigns often shoot for that middle ground, the grey area where claims can still be sensational, provocative, even mildly misleading, without actually being so blatantly false as to reflect poorly on the candidate's public character and facade of honesty. They seem to follow some sage advice I received early in life.

How does one become a good liar?
Tell the truth almost always.

The truth is a fluid and relative thing these days. It depends on whom one chooses to believe. Even things we once called "facts", things like, "this person was born in the U.S.," or, "this person is a Christian," or, "this person never cheated on his taxes," all are now written in various shades of grey. It's not that we treasure the truth any less than before; if anything, just the opposite is the case due to increased skepticism. It's not that there are too few facts; the Internet, among other things, now provides a treasure trove of information about every conceivable topic.

The challenge, IMO, is that there are so many facts, so much information available to each of us. Facts are weapons in the political wars that currently rage in this country, and fact checkers are the arms dealers who make money selling to both sides of the conflict. Like weapons, there are high-quality facts and low-quality ones. Facts purchased from the "fact checkers" at Fox News or MSNBC are somewhat more likely to jam on you in the heat of battle or blow up in your face like a crude IED. But these aren't the really dangerous ones. The reputable arms dealers, the neutral fact checkers and think tanks sell large quantities of durable, usable information to either side for a price. In political skirmishes these days, you cannot be lulled into a false feeling of security because you are armed with the facts. The other guys are probably packing heat as well, just as much as you and maybe more. Standing up to declare victory because truth is on your side is a sure way to become a bloody casualty.

I am not advocating that fact checkers be outlawed, of course. Nor am I saying that free and open information is bad. The NRA has the right idea, facts do not kill candidates, ideologies kill candidates. An IED in the hands of a radical ideology can do more damage than an aircraft carrier in the hands of a civilized one. Wars and political debates are won not with weapons or facts, but by leaders who know what they are defending, whose loyalties are steadfast, and whose vision extends far beyond the next battlefield.

Saturday, September 15, 2012

Consensual Wisdom

I have a theory that almost all of us agree on almost everything. But we're wired somehow to zero in on the things we disagree about, however minor they may be, and debate/argue/fight over them. It seems the larger our base of common understanding, the more intense the disagreements over the details. Sunnis and Shiites, Christians and Muslims, Republicans and Democrats, kill each other (figuratively and sometimes literally) over the smallest ideological differences, despite their enormous shared premise. As a result, we ignore the big problems that challenge us all and focus instead on the minutiae of our dissonance.

But what if there were an environment where we were rewarded for finding our kernel of consensus? A game, say, where the rules guide us toward addressing the problems we all agree exist. An economy where one becomes individually wealthy only by taking actions that benefit everyone. A society that can grow as large and diverse as necessary and still make progress toward common goals.

I'm not talking about Utopia here. I'm just asking, is there a set of rewards and punishments, however artificially enforced, that will lead a group of people to focus on their considerable shared world view rather than their comparatively tiny differences?

Science, politics, and religion all have enormous edifices of consensual wisdom. Evidence and ideas that diverge from or seem to contradict those shared bodies of knowledge induce different reflexes in each discipline. Religious thinkers tend to reject them, scientists embrace them, and politicians use them as leverage. That's all fine, but the incessant preoccupation with what makes us different tends naturally to fragment us, to balkanize our churches, our research disciplines, and our societies until, in the limit, we will be left completely harmonious but quite alone. Social entropy will be at its maximum.

None of our institutions have evolved mechanisms of inclusion, systems that seek to expand groups by including more and more people who actually disagree with us. It's so rare that you may not even see the point in doing so. Maybe I am being hopelessly naive here, but it seems to me that recognizing our kernel of consensus, the common knowledge, shared problems, and joint solutions becomes more and more valuable as the size of the union gets larger, even if the intersection becomes smaller; particularly so if my hunch is correct that almost all of us agree on almost everything.

Perhaps I'm wrong though. Maybe, if all the religions of the world were to somehow identify their kernel of consensus, it would turn out to be trivial; something stupid and obvious like, we all have an ethical core and we should all give ourselves permission to obey it. But wait. Is that really so trivial? I, of course, don't know what the intersection of all our beliefs would be, but imagine the incredible power inherent in being able to make a statement like that! "We, the combined theological intelligentsia of the entire planet, believe the following to be true..." Imagine the societal and spiritual benefit that might ensue from simply communicating that consensus, whatever it turns out to be. I believe that benefit might far outweigh whatever good comes from arguing about who has the coolest prophet and whether we should utter its name or view its likeness.

I don't mean to pick on religion here. In many ways science is even less concerned with consensus and more obsessed with the deltas and outliers. That's what they do for a living, and it's what makes it so difficult for societies to choose courses of action based on reason and scientific consensus at a point in time. Many scientists believe it isn't their jobs to even try to identify consensus among themselves, rather they should provide politicians and decision-makers with all the possible alternatives and let them make decisions about what, if any actions to take. I think this is an abdication of their social responsibility. We are left to guess, infer, and in many cases, attempt to obfuscate scientific consensus on important topics.

Is the Earth's climate changing at a rate unprecedented in recorded history? I think I'm safe in saying almost all climate scientists agree that it is. Is this climate change anthropogenic (caused by human activities)? Maybe there's a consensus answer to that question, maybe not. Does evolution exist? Absolutely it does, in the same sense that quicksort (a well-known sorting algorithm) exists. Is it sufficient to explain the diversity of life on the planet? Without inviting criticism and controversy, I can't say whether there is a scientific consensus on that question. My point is, there are no conventional mechanisms in science for reaching and communicating conventional wisdom. Claiming that no consensus exists because one or a few scientists disagree is fallacious. Consensus does not require unanimity. It does, however, require a collaborative mechanism of discovery and refinement.

No, I'm not having a Kumbaya moment. My advanced age guarantees I have a healthy daily dose of skepticism and pessimism. The events of the past few days have done absolutely nothing to make me suddenly feel optimistic about our ability to live together without killing each other for nonsensical reasons.

But even with all this negative energy swirling around today, I'm left with an intellectual curiosity about the central question of this post: Now that the Internet has given the world a way to communicate instantly, can we also find a way to agree?