## 25 September 2007

### A problem of Feynman on parallel computation

Here's a problem from Feynman Lectures on Computation (which, incidentally, I didn't know existed until I was absent-mindedly wandering in the library a few days ago):

Problem 1.1(c). Most present-day [mid-1980s] computers only have one central processor -- to use our analogy, one clerk. This single file clerk sits there all day long working away like a fiend, taking cards in and out of the store like mad. Ultimately, the speed of the whole machine is determined by the speed at which the clerk -- that is, the central processor -- can do these operations. Let's see how we can maybe improve the machine's performance. Suppose we want to compare two n-bit numbers, where n is a large number like 1024; we want to see if they're the same... [suppose] we can hire n file clerks, or 2n or perhaps 3n; it's up to use to decide how many, but the number must be proportional to n. [How can you get the comparison time to be proportional to log n?]

The proof is by induction. The base case, when n = 1, is clear -- we can check that two 1-bit numbers are the same in one step. We want to show that if n clerks can do this in time t(n), then 2n clerks can do it in time t(n) + c, where c is a constant. But that's easy -- split each of the two n-bit numbers a and b into half: a = a1a2, b = b1b2. If a1 = b1 AND a2 = b2, return TRUE; otherwise return FALSE. That's just one more time step once you've figured out if a1 = b1 AND a2 = b2.

The picture I get in my head of this situation makes me smile. You need n clerks -- n parallel processors. The kth clerk compares the kth bit of the first number with kth bit of the second number; ey returns a 0 if the two bits are different and a 1 if they're the same. (This is, of course, bitwise exclusive OR exclusive NOR (thanks Anonymous!).) Then everything pours into a single node which is at the root of a binary tree of depth log2 n.

The next subproblem in Feynman is to add two n-bit numbers in logarithmic time by using such parallel processing. (This is harder because you have to worry about carrying!) Again, it reduces to a problem of combining information on words of one length to words of twice that length -- if you have a = a1a2 (where a1 is the "high", or more significant half, and a2 is the low half), and b = b1b2, what is a+b? Breaking it down into its high half and its low half,

a + b = (a1+b1+c)(a2 + b2)

where c denotes a carry from the addition of a2 and b2, and a2 + b2 denotes addition modulo 2k where k is the length of the summands. We assume a1+b1, c, and a2 + b2 are already known. The problem is... what if a1+b1 ends in a bunch of 1's, and c is 1? We have something like

...0111111 + 1

and we want to add the 1, so we think "okay, I'll just change the last bit in the first summand from 0 to 1"... but it's 1... so you change the last two bits from 01 to 10... but no, that won't work either... so you change the last three bits from 011 to 100...

On average this isn't a problem. On average the number of 1s at the end of a1+b1 is unity.

But not always. Somebody could be deliberately feeding you numbers that are hard to add. Some sort of sadistic grade school teacher.

edited, 10:48 am: Welcome to those of you who came here from reddit. Be aware that I am not a computer scientist -- I'm a mathematician -- so I might not really know what I'm talking about here.

John Armstrong said...

But not always. Somebody could be deliberately feeding you numbers that are hard to add. Some sort of sadistic grade school teacher.

From what I remember of my halcyon computer-science days, this is the difference between "worst-time" analysis and "amortized" analysis.

Isabel said...

I've heard of amortized analysis; I'm not sure if that's exactly what I'm doing here, or if what I did here was what they call "average-case" analysis.

Colin said...

Amortized analysis is where you prove that aggregating a bunch of consecutive operations takes a given order of time (often less than the worst case for a single operation).

As you said, what you mentioned above is closer to average-case analysis.

Amortized analysis usually has some concept of which operations are "cheap" and which ones are "expensive", and then construct a proof that there are enough cheap operations to cancel out the expensive ones.

augustss said...
This comment has been removed by the author.
augustss said...

You didn't mention the complexity of addition, but using O(n^3) clerks (assuming a clerk is a transistor) you can can add in worst case O(log n) time.

John Armstrong said...

Sorry, colin's right. As I said, it's been a long time...

But the two aren't wholly unrelated. Let's say you're given the task of adding up a big list of pairs of binary numbers. You can partition them into rough difficulty classes and say "these are easy, those are hard". Then you get a running time for easy sums and hard sums, and average over some probability distribution. Is this amortized or average-case?

Tim Bates said...

That's average case, because you can't guarantee the proportions of easy to hard numbers across all possible inputs.

Imagine something like a resizable array, where each time the amount of space allocated to the array runs out we double the amount of space allocated. Then adding to the array is amortized O(1), since most of the time we just write into the available space (O(1)) and each time we run out of space we have to allocate more and copy all the old data over (O(n)).

Call the amount of space allocated the "size" of the array, and the number of items in it the "length". If the array starts with size 4 and length 0, adding the first four items take O(1) time. Adding the fifth item (n = 5) takes O(n) as we have to copy all 5 items across. The size is now 8, so adding the next three items (n = 6, 7, 8) takes O(1) time. Adding the ninth takes O(n). And so on.

Because we expand the size of the array by a constant proportion (multiply by two in this case), each doubling takes O(n) time but only happens 1/n as often (at n = 4, 8, 16, 32 ...) so *on average* this is an O(1) operation. Because this property has nothing to do with the nature of the inputs, it is amortized O(1) time.