Saturday, May 3, 2014

The Divine Random in Mathematics

I am the rock, you the stone,
Together we ballast reality.
Steady toward uncertain destiny,
Whither the sheets above us are blown.
Until fairly recently, 1931 to be exact, scientists and mathematicians believed we would someday find a theory that explains everything. By "theory" I mean a Formal Axiomatic System (FAS). A FAS is simply a relatively small set of given facts, axioms, from which a much larger set of theorems can be proven "by mechanical means." The axioms in a FAS are so basic that we believe them to be true without the need for further justification. A famous and straightforward example of a FAS is the small set of axioms provided by Giuseppe Peano at the end of the 19th century to formally define the natural numbers. The Peano axioms, in English, are as follows:
  1. The natural numbers contain the number 1.
  2. Every natural number has a successor, the next natural number.
  3. The number 1 is not the successor of any natural number.
  4. Two different natural numbers cannot have the same successor.
  5. If a set contains the successors of each of its members and the number 1, then it contains all the natural numbers.

From this small set of seemingly obvious "facts" and using the mechanical rules of logic, all of number theory can be derived. Addition, multiplication, prime numbers, prime decomposition and the Fundamental Theorem of Arithmetic, are all consequences of the five Peano Axioms listed above. Or, to look at it the other way round, the Peano Axioms constitute a very efficient compression of the entire body of number theory.

And that is how mathematics works. Every branch of it — geometry, algebra, analysis, etc. — can be boiled down to a similar compression of a very large body of theorems into a relatively small set of axioms. Unlike other sciences such as physics, biology, chemistry, or computation, mathematics is unconstrained by the laws of the universe in which we find ourselves living. It is purely an invention of man, limited only by our evolving intellect. We invented the game and control the rules. Surely, there are no fundamental reasons why we couldn't develop a mathematical Theory of Everything (ToE), a FAS from whose axioms one can derive by mechanical means all that is true.


Albert Einstein and Kurt Gödel
That is what pretty much every mathematician believed until 1931 when a 25 year-old Austrian post-doc named Kurt Gödel shocked everyone by proving that mathematics, that wholly man-made "queen of sciences," suffers from the same kind of fundamental limitation Heisenberg had discovered in physics five years earlier and Turing would find for computation five years later. Gödel's meta-theorem proved there is a shoreline beyond which vast shoals of truth exist that the lighthouse of pure mathematics cannot illuminate.

More specifically (and much less metaphorically), Gödel showed that for any FAS that is at least as rich as the Peano Axioms described above, and for which the axioms are not self-contradictory, there will be theorems expressible in the language of the FAS that are true, but un-provable from the axioms. That is, any consistent FAS is necessarily incomplete. Pure mathematics is not sufficient to account for a ToE. Truth, with a capital T, is incompressible. There will always be a theorem, perhaps infinitely many, whose shortest expression is the theorem itself. The only way to prove such theorems is to add them to the set of axioms!

Like other impossibility meta-theorems, e.g., Heisenberg's Uncertainty Principle in physics and Turing's Undecidability result in computing, Gödel's Incompleteness Theorem identifies a fundamental limit on our ability to know things, in this case using the tools of mathematics. As we have seen for physics and computing, beyond these boundaries lies randomness, and the same appears to be true for math as well. Our simple working definition of randomness, that it be universally unpredictable, can be made more precise in the realm of mathematics. Namely, something is mathematically random if there exists no FAS (with a finite set of axioms) from which it can be derived. A random number, in particular, is a number whose digits cannot be predicted within the framework of mathematics. Such a number is irreducible — the shortest possible way to express one is to write down all its (infinitely many) digits. If we can agree that a name for something is just a finite sequence of symbols that represent it, these numbers are necessarily nameless. Even numbers like π and e, which are transcendental, their digits never repeat, have namesπ and e are the names we have given to two specific infinite real numbers that can be computed in two very specific, and finite, ways. Mathematically random (aka irreducible) numbers are even weirder than transcendentals. So weird that it becomes reasonable to ask, do such numbers actually exist, and if so, can you show me one?

It's actually pretty easy to see that irreducible numbers, numbers that cannot be derived within any FAS, do indeed exist. However complex pure mathematics may become, it must eventually have a finite set of axioms from which all theorems, including those showing the existence of certain numbers, can be mechanically proved. A finite set of axioms can be used to derive an infinite set of theorems, just like a finite alphabet can generate an infinite set of names. But in both cases, the infinity of derivations is a countable infinity — there are only as many of them as there are integers. On the other hand, there is an uncountably infinite number of real numbers, which is a bigger infinity. So if you match up numbers to name/theorems, you will always have more of the former than the latter. In other words, there must be an (uncountably) infinite number of irreducible numbers. This is a so-called diagonalization argument that can be made formally rigorous, but is never very satisfying. What we'd really like to do is hold one of these odd beasts in our hands and examine it up close.

One obvious way to generate such a number is to appeal back to physics. We could simply flip a fair coin and write 0 for heads or 1 for tails, and continue doing that to generate as many bits as desired. Assuming we trust the fairness of our fair coin, the bits (digits) of such a number would be universally unpredictable — every bit is a completely separate "fact" and there is no shorter description of how to obtain them other than the number itself. But is there a way to do this while staying within the confines of pure math?


Around the time The Beatles brought bangs to America, in 1966, Dr. Gregory Chaitin found an answer to this question when he discovered what he called the Omega Numbers. Each Ω (there are infinitely many) is definable but completely irreducible. Like π and e, Ω is a transcendental number, but unlike it's more well-behaved cousins, Ω cannot be computed by any algorithm. The smallest possible expression of its digits are the digits themselves. Furthermore, those digits are normally distributed — every digit appears approximately the same number of times — no matter what base (decimal, octal, binary, hexadecimal, etc.) one might choose to represent it. Ω is mathematically random. To me, the most fascinating thing about Ω is that it sits at the nexus of Gödel's incompleteness and Turing's undecidability, telling us that the oceans of randomness beyond mathematics and computing are, in fact, one and the same — the Divine Random.

Chaitin defines Ω using the terminology and formalism of Turing (which, if you'll recall, is still within the realm of pure mathematics) by first choosing a suitable formal programming language and initializing an estimate of Ω to zero. For Chaitin's purposes, this means the language must be prefix-free so that programs are self-delimited. We then enumerate every possible string in the language. For each string we ask, a) is the string a syntactically correct program with input data appended, and b) does that program halt when supplied with the input data? If the answers to both a) and b) are Yes, we add the following quantity to our estimate: 1/2|p|, where |p| is the number of bits required to represent the program in the chosen language. If not, we just continue combing through all possible strings looking for legal, halting programs.

Does such a procedure produce a well-defined number? Absolutely! It is surely possible to lexically enumerate all possible strings, every string is either a legal program+data or not, and every legal program+data either halts or it doesn't. In fact, Chaitin showed that the resulting number, Ω, is a real number between 0 and 1. The catch, of course, is that it is impossible to compute such a number because, as Turing proved, it isn't possible to compute the answer to the halting problem in full generality. In a certain sense, each bit of Ω is chosen by flipping a mathematical coin — heads the program halts, tails it doesn't — whose outcome is universally unpredictable and uncomputable.

So what does it look like, you must be asking? Well, I can't say. As with other objects we have encountered floating in the sea of randomness, Ω can be seen only by its faint, hazy outline. We can begin to estimate Ω but we can never know more than a finite number of its digits. Moreover, it is impossible to compute how accurate our estimate is, how close or far away it is to the actual number. Ω will always remain fuzzy, its properties uncertain beyond some degree of magnification, much like the properties of physical objects governed by Heisenberg's Uncertainty Principle. Chaitin put it best in his excellent and accessible little monograph Meta Math! The Quest for Omega:

So fixing randomness in your mind is like trying to stare at something without blinking and without moving your eyes. If you do that, the scene starts to disappear in pieces from your visual field.

The harder you stare at randomness, the less you see of it!
Even if we can't know exactly what it looks like, the very existence of Ω proves, or at least strongly suggests, Gödel's incompleteness meta-theorem. There are indeed theorems, and numbers, that cannot be compressed, that cannot be derived by any effective procedure from a smaller set of axioms. And we know that mathematics is limited in this way because Turing showed us there are limits to what any effective procedure, aka computer program, aka Finite Axiomatic System, can accomplish. No matter how clever we humans are or will become, there are barriers to our knowledge that cannot be scaled. Tantalizingly, the answers are out there, floating around as random numbers. Ω, for example, solves the halting problem for every possible program. And it exists; it's out there, we just aren't allowed to see it. Similarly, there exists a random number whose binary bits answer every conceivable yes/no question that can be posed in a given language, including questions like, "Is Goldbach's conjecture true?" Science can't give us all those answers. Physics, mathematics, and computation theory are well-honed tools, but they cannot carve all the universe.

We are now approaching the point at which you might begin to understand why I call it, The Divine Random.

Monday, February 17, 2014

Creation

This chapter concerns the second time I created something smarter than me. The first such occasion, the birth of my only son, was not entirely my doing as my wife had a rather significant contribution too. But the second creation was all mine. It was a computer program that played the strategy board game Othello, also known as Reversi. (Note, the link provided leads to a different implementation of the game.) I wrote it as a semester project while doing research at the Artificial Intelligence Laboratory at the University of Texas at Austin.


I did this because back then, and still today, I was interested in finding answers to some deep questions about the nature of creativity and the creative process. Where do new ideas come from? What makes an idea clever? Imaginative? Inventive? How do humans create new things and can we program computers to do it too? And most generally, how does the Universe create order out of chaos?

It is clear that new ideas are not just mechanically derived from old ones, at least not the really good new ideas. There is usually a spark, a creative aha moment during which something completely new is dredged up from a place where few of us ever go. Ideas can be encoded as (possibly infinite) numbers, and we know (see The Divine Random in Computing) that not all numbers can be computed by any effective procedure. That means there are infinitely many new ideas, most of them in fact, that cannot possibly be derived from what we already know. One must assume that at least some of these infinitely many ideas are good ones. A good truly random number generator will, eventually, produce all possible numbers, including quite a few that encode good ideas. So there we have an answer to our questions: just wait around for the next good idea to come around on the random number generator. As we'll see in a later chapter, it can be shown that a random number exists that encodes the correct answers to all possible yes/no questions expressible in a given language. A single number that contains the answers to all our questions certainly seems to qualify as a good idea.

Of course, that's not very practical. The probability of our random number generator producing the answer to everything, or even a single good idea, in our lifetimes is minuscule, to say the least. And besides, we probably wouldn't recognize a good idea if it presented itself at our front door and brought nachos. It would be like working on a jigsaw puzzle, picking up a random piece and declaring, "Eureka! I've discovered something wonderful." A jigsaw piece or an idea isn't a great find unless it fits within the existing framework we have already constructed and understand. The greatest ideas are not just the ones that are unexpected and fresh, but also the ones that are most tightly constrained, by the rules of mathematics, art, language, etc. But randomness must play a role in the creative process. The close proximity between madness and creativity, genius and weirdness, eccentricity and insanity, has long been known and commented upon by philosophers and humorists through the ages. Genius, it seems, is the exact right admixture of crazy and constraint.

With all those general ideas about creativity, I set out to build a computer program that could surprise me by having a good idea, one that I myself had not built into its code. I chose to write a program that played Othello for several reasons — the rules are simple, the game is easy to play but non-trivial, and perhaps most importantly, I wasn't very good at it. Most of the Othello program consisted of my own implementations of several well-known game playing algorithms that had been and continue to be used for computer strategy games. At its core was the so-called mini-max tree search algorithm with alpha-beta pruning. The linked article has a pretty good explanation of what that is. Basically, it's a formalized way to imagine what will happen in the future. If I do this, my opponent will probably do that, after which I can do this other thing, and so on. Because a computer can do those kinds of things really quickly, it can enumerate through all possible legal moves it has to choose from at any point in the game, and through all the legal moves its opponent would have, through many levels of what-ifs. But the decision tree for Othello, like the ones for Chess and Go, is too big for any computer to completely traverse all the way down to the bottom (at which point it can simply ask which of my current moves leads to a win many moves down the road). So the computer stops searching the decision tree at some arbitrarily-chosen depth and simply looks at the resulting board positions asking itself which ones seem to be the most advantageous for it. This last bit is heuristic, of course. It's a guess based on who has how many pieces remaining and how they are distributed about the board. Game programmers call this a static evaluation function.

There are many, many different types of static evaluation functions for the game of Othello, from simple linear combinations of cell contents (0=empty, 1=mine, -1=opponent's), to more sophisticated heuristics based on pattern-matching. I chose a paradigm that looked for a certain fixed set of patterns on the board and weighted those patterns, when they occurred, to produce a simple number that could be compared against other positions. The choice of static evaluation function thus boiled down to a vector of small integers. Choosing the proper weights for the evaluation vector to make the program play well required deep insights into the game's winning strategies; insights I did not have. And despite its look-ahead capability, the quality of the game's play against me was very sensitive to the choice of static evaluation vector — for some vectors I could beat the program nearly every time, while others were more difficult to win against. Using trial and error, I was not able to find a vector that would make the game play consistently better than me, a poor player.

And then the light bulb came on. Why not have the computer play against itself, using different static evaluation vectors (SEVs), and search for the best one? To do this, I coded the following search algorithm. First, I randomly generated several thousand SEVs to form an initial population of alternatives. I let the computer randomly choose two of those SEVs and play a game, one strategy against the other. Each SEV also had a merit score attached to it, and when an individual SEV won a game against an opponent, the program would increment the winner's merit score and decrement the loser's. In this way, the computer would eventually filter through the initial population and find the best strategy found there.

But what if there was a really great idea, a winning strategy, that did not happen to be one of the few thousand initial, random SEVs? I needed to somehow interject new blood, fresh DNA into the gene pool in order to do a more thorough search of the whole space of all SEVs. To do that, I wrote a second process, in addition to the competition process described above, to periodically generate new individuals. But I didn't just pick a random puzzle piece and cry Eureka! Instead I had the computer choose two different individuals, randomly but with a distribution that favored SEVs with higher merit scores, and combine the vectors in a way that borrows from biology. To combine two SEVs, the computer chose a random position along the vector and split both parent vectors at that point. A new child vector was created by choosing the first segment of one of the parents and the second segment of the other parent and then, with small probability, adding or subtracting one from a randomly selected value on the vector. In biology, this is called chromosomal cross-over with mutation. Once a new individual SEV was created in this way, the individual with the lowest merit score in the entire population was "killed off" and replaced with the new offspring, who was then available for further competition and breeding.


Once the competition and the breeding processes were chugging along, new strategies were being created and the computer played one game after another with itself. I allowed it to run all night, and all day the next day, and all that week, searching for that good idea, that winning strategy that I could not have taught it even if I'd wanted to. Every once in a while, I would interrupt the search to play a game against the current best strategy evolved so far. At first, the randomly-chosen strategies were pretty terrible. After only a few hours, I found I could still beat the computer regularly and with little effort. After a day or two, it became noticeably more difficult to win, and I found I had to concentrate more carefully to avoid the strategic tricks pulled by the computer. After four days, the computer began to win regularly, and after about a week, I found I could no longer win even a single game against the computer program that I myself had created. Now let me assure you, that is a spooky feeling, to be outsmarted by one's own creation. It was definitely a Frankenstein moment for me.

The type of search algorithm I used in the Othello game is known as a genetic algorithm (GA) because it mimics natural selection. GAs typically include most of the elements I've described above: a competition process, a breeding process that uses genetic operators like cross-over and mutation, and a fitness function or merit score. Randomness plays a vital role in the successful operation of any GA, but it isn't just a random search. Applying the fitness function to select competitors and breeding partners ensures the algorithm as a whole behaves as a kind of a algorithmic ratchet, always making progress toward a more fit population. Random variation together with deterministic filtering, survival of the fittest, is the hallmark of GA. Another common characteristic is the distinction between a genome and a phenome. In the Othello example, the genome was the Static Evaluation Vector, which resembles DNA in function. The phenome, the organism itself, is the rest of the game playing machinery — the mini-max tree search with optimizations and the codified rules that prevent the game making illegal moves.

And so you see, evolution exists. It is, as they say nowadays, a thing. I've touched it, implemented it, and watched it create things that surprised me. Evolution is not simply a random number generator, but rather an algorithm that systematically trolls the vast ocean of randomness to make steady progress in the direction of survivability. It is not just a way to create useful new ideas and individuals, it may be the only such algorithm we know of. And so it worries and frustrates me to see results like the recent Pew poll on the Public's View of Human Evolution that suggests more and more people do not believe that biological evolution is a thing.

To be sure, random mutation without the rational, mechanical ratcheting of natural selection would make no progress at all. The probability of producing something useful, or more useful, from such a coin flip process is unimaginably small. But randomness is the seed of creation, the origin of species. It is the essence of humanity and certainty its antithesis. When a man is entirely reduced to numbers, he ceases to be alive.