Friday, January 28, 2011

Impossibility Result for Asynchronous Consensus

From Fischer, Lynch, Paterson, "Impossibility of Distributed Consensus with One Faulty Process"

In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable- it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement. Thus, this important problem has no robust solution without further assumptions about the computing environment or still greater restrictions on the kind of failures to be tolerated!

Wednesday, January 26, 2011

Provably Probable Is Better Than Probably Provable

Randomness and distributed algorithms are often found together, like movie stars and orphans. It's not just that distributed algorithms often exhibit stochastic behavior, which they do; randomness is also an essential ingredient of many distributed solutions. I'm not speaking loosely when I use the word "essential" here. There are many problems in the distributed arena that are provably impossible to solve deterministically, but if you give processes access to a fair coin and allow them to flip it occasionally, voila!, an efficient, reliable solution emerges that yields the correct behavior. Probably.

The first time I encountered this was with a canonical problem in distributed fault tolerance, called the Byzantine Agreement problem. Don't Google it! There's been so much written and so many results about the topic over the last thirty years that you'll never return if you vector off now. Instead, let me present a concrete instance of the general problem in order to talk about why randomness is such a useful, and in some cases essential, part of the solution.

Suppose there are four computerized weather stations distributed about a certain city. Each station has, among many other sensors, a light detector, and the stations can communicate with each other via whatever protocol we wish to devise. Every hour, on the hour, each station samples its own light sensor and makes an independent judgement about whether it is "daytime" or "nighttime." At noon and at midnight, it is highly likely that all stations will make the same assessment about whether it's day or night. But at dusk and dawn, local variations in light and sensor sensitivity will probably cause the stations to arrive at different conclusions about which is the best classification. Our job is to devise an algorithm and a protocol that allows the stations to all agree on a single answer, and that has two very reasonable properties: 1) if all the stations individually come to the same conclusion about whether it's day or night, then the group must agree on that answer, and 2) even if they don't start out with the same answer initially, they must still agree on some value (it doesn't matter which).

But wait, there's no villain in this story yet. One of the stations doesn't follow the rules. It can say anything at any time to any other station without constraint. Maybe it's faulty, or maybe it's nefarious. Crazy or evil -- it doesn't matter. We may as well assume the bad guy is a malicious spy who knows everything that has happened, using that information to thwart our little plot and prevent the remaining stations from agreeing whether it's day or night. Now there's a worthy antagonist.

There is a simple, but subtly wrong, deterministic algorithm that attempts to allow the three non-faulty stations to reach agreement at all times of the day or night, even with the omniscient meddling of the faulty one. It goes like this: Each station sends its initial value to every other station. After collecting all four values (including its own), each station determines whether three of the four responses have the same value. If so, the station takes this value as its final answer. If it's a tie, the station picks a predetermined default, say "daytime," and the nodes exchange values in another round of communication. The algorithm terminates when all the correct nodes have chosen the (same) final answer.

You can see that, if it's noon or midnight and all three non-faulty stations report the same value, there is little the nefarious foe can do to confuse this algorithm; they will all see at least three identical answers and immediately choose it. There is danger at dusk or dawn, however. We're still okay if two of the non-faulty stations thinks it's day and the other one thinks it's night. In this case, there's no way the nefarious foe can influence the vote so that "nighttime" is chosen by any of the non-faulty stations. They will all pick the default value, "daytime," after the first round and confirm it as the agreed value on the second round.

The remaining case is vulnerable to attack. When two of the non-faulty stations initially think it's night and the other thinks it's day, the nefarious foe, in full omniscient glory, can whisper "nighttime" to the nighttime stations and "daytime" to the daytime station. The nighttime stations will therefore continue to vote nighttime, because they see three votes for that value, while the minority station will continue to choose the default value, "daytime." Thus, the faulty foe can ensure the stations never reach agreement on any value.

Surprisingly, this simple but incorrect deterministic algorithm can easily be converted to a simple and correct probabilistic one, just by giving each station a fair coin to flip. Instead of picking a predetermined default value (which is subject to being discovered and exploited by the evil one), each station flips its coin in the event there are not three votes for the same value. Heads, it's daytime, tails it's night. If the non-faulty stations do not unanimously agree, The Faulty One can still influence one of the two cases, as described above, but since it cannot predict the value of the coin flip (that's what random means) there's no way to tell which is the vulnerable case. There's only a 50% chance our malicious adversary will be able to thwart agreement on any given round, so if we just keep repeating the algorithm round after round, the stations will eventually reach agreement, despite the best efforts of the know-it-all evil station.

Probably. We have a choice with this probabilistic algorithm. We can either stop it after a fixed number of rounds, and accept the (tunably small) possibility that agreement has not yet been reached, or we can run it until agreement is reached, and accept the (tunably small) possibility that it takes so long we no longer care whether it's night or day. This exact trade-off is typical of distributed, probabilistic algorithms.

To be fair, there is a simple deterministic algorithm that solves this problem (see the seminal 1980 paper by Pease, Shostak, and Lamport), but it depends on a subtle condition that doesn't always exist in real-life distributed systems -- synchronous communication. Another famous result in this research area is that of Fischer, Lynch, and Paterson, which proves no deterministic agreement algorithm can exist unless we can give, with certainty, an upper bound on how long it will take messages to be delivered between processes (weather stations in this case). And yet, there are many good probabilistic solutions to this important practical problem!

Back in 1987, a fellow graduate student at the University of Texas at Austin, Russell Turpin, and I were struck by this result. So much so that we spent several months reading and analyzing the plethora of research on probabilistic agreement and writing a survey paper on the topic. Now, twenty-five years later, Dr. Turpin and I are still friends and colleagues, and we expect our interest and passion for distributed solutions to continue for many more years to come.

Probably.

Wednesday, January 19, 2011

A Party Game

Let's play a party game. Here are the rules. When each of us arrives at the party, the host hands us some flash cards. Each card has a single number and a single word written on it. Generally, cards have different number-word pairs, but the same pair might be repeated on more than one card. A given number is always associated with the same word, however.
Everybody gets some number of cards, from just a few to a handful or even a whole deck. We are told we may swap cards with others, in pretty much any way we want, except we cannot look at others' cards. In fact, we can communicate with others in only two limited ways: we can write a short note on a slip of paper and hand it to one other person, or we can shout a message to the entire group. The latter mechanism is somewhat error-prone since we can never be sure others are listening and can hear our shout. Also, someone else may decide to shout at the same time, in which case neither of you may be heard.
After a few drinks, the game commences. The host chooses someone at the party and hands her a slip of paper on which is written a simple question, "What word is on the card whose number is 50342?"
Staying within the rules and constraints outlined above, how would you behave, and how would you want other party goers to behave, in order to "efficiently" answer that question?

Sunday, January 16, 2011

Automatons, Organisms, and Teenagers

Wouldn't it be great if you could tell your child exactly what to do in a given situation and be completely confident he or she would do just that, every time, without question? If you've ever raised a teenager, I promise you've wished for that before. But would it really be all that great?
Yes, we could, theoretically, help our kids avoid many of the mistakes that we made growing up. On the other hand, they could never be better than us. It turns out, all the pleasures and pains of child rearing stem from the unexpected, emergent behavior of our children. Thank goodness kids are organisms, not automatons.
My software team and I at Caringo, Inc. are raising our own child, named CAStor -- a symmetrically distributed, massively scalable, clustered storage system. CAStor is not quite a teenager yet, but already I'm getting questions from customers and partners.
"Why does it behave this way in that situation?" they ask. To which I'm often forced to answer simply, "I don't know."
When the behavior in question results in a positive outcome (which it frequently does), they wonder why I don't take credit for it. After all, CAStor is a computer program, a digital automaton, and we programmed the darned thing. How could we not know why it behaves the way it does?
The answer is, computers may be automatons, but computer networks are organisms.
In the past two decades, we have seen the emergence of a whole new class of application and infrastructure software. Scalable, robust, distributed solutions capable of handling hundreds of millions of users, perform billions of transactions, store and retrieve millions of gigabytes, and maintain high availability despite hardware failures, software upgrades, and malicious attacks. Those of us who design and architect such solutions understand (sometimes) there is a qualitative difference between these organic, adaptive computer networks and the "do this now" sort of programs we learned to write in school.
This blog is about the practical potential and limitations of distributed computing, the trade-offs between determinism and probability, certainty and surprise. Along the way, I hope we'll explore all kinds of topics, from space photography to train tracks, thermodynamics to wine glasses, the meaning of knowledge, randomness, and time.