## 01 July 2010

### Einstein plays dice

Not only does God throw dice, but Einstein does too, or at least a stencil of him on a wall in the Upper Haight in San Francisco does. This post suggests that it may have been by the graffiti artist Banksy. It's been painted over.

More pictures here and here.

It's been painted over, apparently. That's probably for the best, because that means I won't try to find it when I move to the Bay Area.

(Oh, yeah, I'm moving! I got a job at Berkeley.)

## 30 June 2010

### How not to visualize the electoral college

I went to the National Constitution Center in Philadelphia today.

As you may know, the Constitution provides that, in elections for the President, each state receives a number of electors equal to its total number of senators and representatives. Each state has two senators, and the number of representatives is proportional to the population. The number of representatives is adjusted after the census, which happens in years divisible by ten.

Why am I telling you this? Because at one point on the wall there was an animated map, which displayed how apportionment had changed between censuses. Each state was represented as a "cylinder", with base the state itself and height proportional to its number of electors. (Or representatives; it honestly would be impossible to tell the difference by eye, as in this scheme that would just push everything up by two units.) There was one such display in the animation for each census, with smooth transitions between them.

Since the eye wants to interpret the "volume" of a state as its number of electors, this has the effect of making geographically-large states look like they have better representation than they do. I noticed this by looking at New Jersey and Pennsylvania, which have areas of 7417 and 44817 square miles, and 15 and 21 electors respectively. The solid corresponding to Pennsylvania has about eight times the volume as that corresponding to New Jersey. New Jersey's an easy one to look at because it happens to be the most densely populated state at the present time, and in this visualization it is not the tallest.

The volume of the solid corresponding to each state is proportional to the product of its number of its electors and its area. The states for which this product is largest are, in order, Texas, California, Alaska, New York, Florida, Illinois, Arizona, Michigan, Pennsylvania, and Colorado. The first two of these, between them, have 41% of the total volume in this visualization.

I'd suggest replacing this with a model where volume is proportional to the number of electoral votes. Or, since that might have its own problems, a cartogram which evolves in time. The West would just grow out of nowhere.

## 23 June 2010

### The ridiculously long match at Wimbledon

As you may have heard, there's a match at Wimbledon , between John Isner and Nicolas Mahut, in which the last set is tied at 59 games. (The previous longest set at Wimbledon was 24-22.)

A set goes until one player has won six games, and has also won two more than the opponent. This means that back since the set was tied at 5 games, games 11 and 12 were split by the two players; so were 13 and 14; and so on up to 117 and 118.

Terence Tao points out that this is very unlikely using a reasonable naive model of tennis, which assumes that the player serving has a fixed probability of winning the game. (Service alternates between games.) His guess is that some other factor is at play; for example, "both players may perform markedly better when they are behind".

This seems statistically checkable, at least if records of that sort of thing are kept. I'm not sure if they are; it seems like tennis scores are often reported as just the number of games won by each player in each set, not their order. Another hypothesis, of course, is that the match has taken on a life of its own and, subconsciously, the players are playing to keep the pattern going.

Edit (Thurs. 7:49 am): More on Isner-Mahut: Tim Gowers' comments, and some odds being offered by William Hill, the betting shop.

### Reconstructing World Cup results

The final standings in Group C of the 2010 World Cup were as follows: USA 5, England 5, Slovenia 4, Algeria 1.

Question: given this information, can we reconstruct the results of the individual games? Each team plays each other team once; they get three points for a win, one for a draw, zero for a loss.

First we can tell that USA and England must have had a win and two draws, each; Slovenia, a win, a draw, and a loss; Algeria, a draw and two losses. (In fact you can always reconstruct the number of wins, draws, and losses from the number of points, except in the case of three points, which can be a win and two losses, or three draws.)

Since neither USA nor England have a loss, they must have drawn. Similarly, Slovenia's win must have been against Algeria.

But now there are two possibilities; we have to break the symmetry between USA and England. Let's say, arbitrarily, that USA drew against Slovenia and defeated Algeria, instead of the other way around. (This is, in fact, what happened.) Then Algeria's draw must have been against England, and England's win against Slovenia.

In an alternate universe where USA and England switch roles (does this mean that England was a USA colony in this universe?) USA defeated Slovenia and drew against Algeria, and England draws against Slovenia and defeats Algeria.

Of course, the next question is: given the goal differentials (+1 for USA and England, 0 for Slovenia, -2 for Algeria), can we figure out the margins in the various games? (Assume we know which of the two universes above we're in; for the sake of avoiding cognitive dissonance, say we're in the first one.) Since Algeria was only defeated by a total of two goals, the margin in each of their losses must have been 1. And the margin in the Slovenian win (to Algeria) and loss (to England) must have been the same, namely 1.

If you in addition are given the total number of goals scored (USA 4, Slovenia 3, England 2, Algeria 0) you can reconstruct the scores of each match. I leave this as an exercise for the reader. Hint: start with Algeria.

Another question: is it the "usual case" that individual match results can be recovered from the final standings, or is this unusual? The table of standings in a group in the World Cup has something like thirteen degrees of freedom. Given the number of wins and draws, goals scored, and goals against for three of the teams, we can find the number of losses and goal differential for each team, the number of wins, draws and losses for the fourth team, and the goal differential of the fourth team. We need one more piece of information - say, the number of goals scored by that fourth team - to reconstruct the whole table. We're trying to derive twelve numbers from this (the number of goals scored by each team in each match). It will be close.

In an n-team round robin, the number of degrees of freedom in the table of standings grows linearly with n, but the number of games grows quadratically with n. For large n it would be impossible to do this reconstruction; for n=1 it would be trivial.

## 21 June 2010

An example of a bad word problem, from Frank Quinn's article The Nature of Contemporary Core Mathematics, who is at Virginia Tech:

Bubba has a still that produces 700 gallons of alcohol per

## 05 May 2010

### Proof Math is Beautiful

Proof Math is Beautiful is a blog of pictures which come from mathematics.

### Street-Fighting Mathematics is a book

From the Chronicle of Higher Education: The Gospel of Well-Educated Guessing, on Sanjoy Mahajan's Street-Fighting Mathematics. (Previously: here, and here.) It's now a real book!

Here's a calculation I hadn't heard of before, and don't actually know the details of:
They were both right, in a sense: some of the calculations he pulls off have a hint of Houdini. For instance, he can start with two paper cones, to find the relation between drag force and velocity, and—believe it or not—arrive at the cost of a round-trip plane ticket from New York to Los Angeles. He works out the problem in a blur of equations, remarking that a gram of gasoline and a gram of fat contain the same amount of energy, that drag force is proportional to velocity squared, and so on. The number he arrives at ($700) isn't the cheapest deal out there, but it's roughly right. I've recently priced PHL-(SFO/OAK) flights, and this is roughly right. (And this uses chemistry, which is awesome because I was a chemist in a former life. Gasoline and fat are both basically long chains of carbon atoms.) The article tells of other similar party tricks. It would be nice to see some details, but the Chronicle seems to pitch itself at a humanities-ish audience. ## 02 May 2010 ### Arithmetic geometers write about statistics Jordan Ellenberg, in yesterday's Washington Post: The census will be wrong. We could fix it. This continues a proud tradition of mathematicians whose area of expertise is nowhere near statistics writing newspaper pieces saying that statistical sampling in censuses a good idea; Brian Conrad, 1998, New York Times. In some sense it carries more weight when mathematicians who don't spend most of their time battling randomness in some sort or another . Statisticians of course think that doing statistical adjustments to the census in order to make it more accurate is a Good Idea; it gets them, their students, or their friends jobs! As a combinatorialist I admire the theoretical elegance of our country's once-a-decade exercise in large-scale, brute-force combinatorics. But in practice, well, of course it needs some statistical help. And here's something interesting: Since 1970, a mail-in survey has provided the majority of census data, so what we enumerate is not people but numbers written on a form, which are as likely to be fictional as any statistical estimate. I wonder if people are actually lying on their census forms. I suspect this would skew the count upwards. People who deliberately lie on their census forms, at least the sort of people I know, are likely to give "joke" answers. And large numbers are funnier. I live in a one-bedroom apartment, and if I were the sort of person who lied on government forms I would easily say that ten people live in my apartment. I can't give a comically low number of people living here, because the census insists that a positive integer number of people live in each place. Does the census has some sort of way to correct for this? ## 01 May 2010 ### The probability that 901 coins have total value$100

Here's a cute little problem from Reddit: Tough question for you guys. Let's say you have 901 coins that come out to exactly $100. What are the odds? (Also here.) Everyone there who gets a solution is assuming that all the possible coins are equally likely, which isn't a reasonable assumption. Years ago I looked at the density of money, where I used a model in which I get back from each transaction n cents with probability 0.01, for n = 0, 1, ... 99; furthermore I always get back the smallest possible number of coins. The only coins allowed are pennies, nickels, dimes, and quarters (worth 1, 5, 10, and 25 cents respectively). As I calculated before, if I make 100 transactions, and I get each number of cents back exactly once, I'll get 200 pennies, 40 nickels, 80 dimes, and 150 quarters. This is a total of 470 coins, and worth$49.50. Thus the "average coin" is worth 495/47 = 10.53 cents; 901 coins are "on average" worth $94.89. The value$100 isn't that unreasonable.

So consider a jar with 901 coins, which are independent; they each have probability 20/47 of being a penny, 4/47 of being a nickel, 8/47 of being a dime, and 15/47 of being a quarter. The mean value of a coin is 495/47 = 10.53 cents; the variance is 238840/2309 = 108.12 "square cents".

The mean value of 901 coins, then, is 9489 cents; the variance is 93198 "square cents", so the standard deviation is 305 cents. (Everything here is rounded to the nearest integer.)

Invoking the central limit theorem, then, we say that the value of 901 randomly chosen coins is normally distributed with this mean and standard deviation. The probability of having value exactly 10,000 cents is approximated by the probability density function of this variable at 10,000; that's 0.000322, or 1 in 3101.

An exact answer is feasible -- but not worth computing, I'd say, because the error in the central limit theorem is surely much smaller than the error from the fact that this isn't a realistic model of what actually ends up in your change jar.

## 29 April 2010

### A simple model for baseball

From the April Notices of the AMS, John D'Angelo writes Baseball and Markov Chains: Power Hitting and Power Series. Consider the following simple model of baseball. Players only hit singles; three singles score a run. That is, the third and every following player to get a hit in a given inning score a run. This can either be interpreted as that, say, all runners score from second on a single or all runners go from first to third on a single -- but not both! -- or that every third hit is actually a double. (And I do mean exactly every third hit, not some random one-third of hits, so this is a bit unnatural.) Then the expected number of runs per half inning is p3(3p2-10p+10)/(1-p). For real baseball the average number of runs per half-inning is around one half, which corresponds to p = 0.361.

D'Angelo gives this as an exercise, but I independently came up with this model a while ago and can't resist sharing the solution. Let q = 1-p. The probability of getting k hits in an inning is pk q3 -- that's the probability of getting those hits in a certain order -- times the number of ways in which k hits and 3 outs can be arranged. Since the last batter of an inning must get out, the number of possible arrangements is the number of ways to pick 2 batters out of the first k+2 to get out, which is (k+2)(k+1)/2.

The probability of getting k runs, if k is at least 1, is just the probability of getting k+2 hits, which is pk+2q3(k+4)(k+3)/2. Call this f(k); then

f(1) + 2f(2) + 3f(3) + ... = p3(3p2-10p+10)/(1-p)

by some annoying algebra. I'm pretty sure I came up with this exact model while procrastinating from some real work a couple years ago; it's probably been independently reinvented many times.

With p = 0.361, the probabilities of scoring 0, 1, 2, 3, 4, 5 runs in an inning are .748, .123, .066, .034, .016, .008 (rounded to three decimal places). (Probabilities of larger numbers of runs can also be calculated; together they have probability around .006.)

Assuming that each half-inning is independent, the probability G(k) of a team scoring k runs in a game is, for each k,

 k 0 1 2 3 4 5 G(k) 0.073 0.108 0.129 0.133 0.124 0.108 k 6 7 8 9 10 11 G(k) 0.088 0.069 0.052 0.038 0.026 0.018 k 12 13 14 15 16 17 G(k) 0.012 0.008 0.005 0.003 0.002 0.001

with probability about 0.0006 of scoring 18 runs or more. (This seems a bit low to me -- three times a season in the major leagues -- but after all this is a very crude model!) But one interesting thing here is that the distribution of the number of runs per game, which is a sum of nine skewed distributions, is still skewed; the mode is 3, and the median 4. Recall that I chose p so that the mean would be 4.5. And the actual distribution is similarly skewed.

Of course a more sophisticated model of baseball is as a Markov chain. There are twenty-five states in this chain -- zero, one or two outs combined with eight possible ways to have runners on base, and three outs. We assume that each hitter hits randomly according to his actual statistics, and the runners move in the "appropriate" way. Of course determining what's appropriate here would be a bit tricky. How do runners move? A runner is probably more likely to take an extra base when a power hitter is hitting, but the sample size for any individual is fairly small. But one could probably predict from some measure of the hitter's power (say, the number of doubles and home runs, combined appropriately) the chances of a runner taking an extra base on a single. Something similar is necessary for sacrifice flies (which have to be deep enough to score the runner), grounding into double plays, etc. I'm not sure if the Markov models that are out there, such as that by Sagarin, do this. Sagarin computes the (offensive) value of a player by determining how many runs per game a team composed of only that player would score.

## 28 April 2010

### My thesis!

For the morbidly curious, here's my recently completed PhD thesis, Profiles of large combinatorial structures. (PDF, 1.1 MB, 262 pages (but double-spaced with wide margins)) This is why I haven't been posting!

Abstract: We derive limit laws for random combinatorial structures using singularity analysis of generating functions. We begin with a study of the Boltzmann samplers of Flajolet and collaborators, a useful method for generating large discrete structures at random which is useful both for providing intuition and conjecture and as a possible proof technique. We then apply generating functions and Boltzmann samplers to three main classes of objects: permutations with weighted cycles, involutions, and integer partitions. Random permutations in which each cycle carries a multiplicative weight σ have probability (1-γ)σ of having a random element be in a cycle of length longer than γn; this limit law also holds for cycles carrying multiplicative weights depending on their length and averaging σ. Such permutations have number of cycles asymptotically normally distributed with mean and variance ~ σ log n. For permutations with weights σk = 1/k or σk = k, other limit laws are found; the prior have finitely many cycles in expectation, the latter around √n. Compositions of uniformly chosen involutions of [n], on the other hand, have about √n cycles on average. These can be modeled as modified 2-regular graphs. A composition of two random involutions in Sn typically has about n1/2 cycles, characteristically of length n1/2. The number of factorizations of a random permutation into two involutions appears to be asymptotically lognormally distributed, which we prove for a closely related probabilistic model. We also consider connections to pattern avoidance, in particular to the distribution of the number of inversions in involutions. Last, we consider integer partitions. Various results on the shape of random partitions are simple to prove in the Boltzmann model. We give a (conjecturally tight) asymptotic bound on the number of partitions pM(n) in which all part multiplicities lie in some fixed set n, and explore when that asymptotic form satisfies log pM(n) ~ π√(Cn) for rational C. Finally we give probabilistic interpretations of various pairs of partition identities and study the Boltzmann model of a family of random objects interpolating between partitions and overpartitions.

## 20 April 2010

### Thesis margins

What's the point of having two thousand readers if I can't ask a question like this once in a while?

I'm working on the final version of my dissertation -- the one I'll submit to the graduate school next week. The dissertation manual states that no text may appear in the margin area.

LaTeX, on the other hand, keeps wanting to put some pieces of mathematics, which appear inline, in the margins. (Presumably this is because this is "better" than the alternative of having very long inter-word spaces.)

Two questions:
- is there some way to check that nothing's sticking out in the margin? (I thought this is what "overfull \hbox" meant, but the line numbers where those appear aren't the ones where I have this problem.) There are some things that are just barely sticking out into the margin, and with thousands of lines total I don't trust my eye.
- once I find all the places where text protrudes into the margin, is there some way around this other than just inserting \newline every time this problem occurs? This creates its own problems.

I surely can't be the only person who's had this problem, but Google is failing me.

### Ash clouds and probability

From the Daily Mail: New ash cloud could delay re-opening of London airports. We have this gem: "Critics said the agency used a scientific model based on 'probability' rather than fact to forecast the spread of the ash cloud." See the Telegraph as well.

What else are they supposed to do? The agency here -- the Met Office, which is the national weather service of the UK -- doesn't know what the ash cloud is going to do. If they waited to see what the cloud does, the planes would already be in the air. It would be too late.

## 14 April 2010

### Mathematical relationships search

There's a mathematical relationships search. It will tell you, for example, that academically, Max Noether is the first cousin of Emmy Noether. (Both of their advisors were students of Jacobi.) But Michael Artin and Emil Artin aren't even related.

It's less amusing, of course, when you search for people that aren't related in the standard way. But Paul Erdos is my great-great-great-great-uncle. (You can't search for me yet in the Mathematics Genealogy Project, which is where the data comes from; the link goes to the relationship between Erdos and another student of my advisor.)

### This blog needs a new title

The word "probability" does not appear in the Bible, or so we learn from Conservapedia's List of missing words in the Bible.

I can only conclude that Einstein was right, and God does not play dice.

## 25 March 2010

### My 27th-generation advisor is Elissaeus Judaeus

Mathematical ancestors of Penn math faculty, from October 1999. (This was the department's 100th anniversary.) This lists the advisor's advisor's advisor's... until historical data that was (easily?) available at that time gave out. The longest-ago person listed on this page is Otto Mencke (Ph. D. 1665, 12 generations from Penn professor Stephen Shatz); most chains die out in the 19th century.

As of right now, the math genealogy project claims to know that my advisor's 26th-generation advisor is Elissaeus Judaeus (who was a student in the 1380s). Most mentions of Judaeus on the Internet seem to be by other people who have discovered this (Judaeus has 77000 or so mathematical descendents). But this post from the person who added him to the database gives some background -- he was for the most part a philosopher, it seems. He is described as "a mysterious figure who may or may not have been a Jew". His student Gemistus Pletho seems a little better understood; Wikipedia says "He was one of the chief pioneers of the revival of Greek learning in Western Europe." It seems that in that time a lot more data has been collected for the 14th through 17th centuries.

(As for me, hopefully in a few weeks it'll be possible to add me to the mathematical genealogy project. I defend on April 15.)

## 18 March 2010

### One down, six to go

Perelman has been awarded the Millennium Prize. Press release from the Clay Math Institute.

As Peter Woit points out, no mention of the fate of the million dollars.

From quomodocumque: what would the NCAA tournament look like if every game were won by the college or university with the better math department? (Berkeley -- excuse me, "California", as they're usually called in athletic contexts -- wins.)

Rather interestingly, 20 out of 32 first-round "games", and 37 out of 63 "games" overall -- more than half -- are won by the team that actually has the better seed in the basketball tournament. I suspect this is because quality of math departments and basketball teams are both correlated with the size of the school. This is especially true because ties were broken by asking how many people they could name at the school., which clearly has a bias towards larger departments.

## 17 March 2010

I have a polynomial, P, with nonnegative integer coefficients. You want to know what it is. For any algebraic number x, you're allowed to ask me what P(x) is. How many questions do you have to ask me to be sure that you know what P is?

## 16 March 2010

### McDonalds: more than 10 served

We learn from a Megabus spokesperson that their vehicles use "less than a pint of fuel per passenger mile".

For those of you who don't have the misfortune of knowing this, there are eight pints in a gallon. So these busses get better than eight passenger-miles to the gallon!

Since most cars in the US get at least 20 miles or more to the gallon, this is really nothing to be proud of.

(I'm guessing that busses are actually more fuel-efficient than cars, at least if they run sufficiently full.)

## 14 March 2010

### The Calkin-Wilf shirt

You can actually buy a shirt with the Calkin-Wilf tree on it. I probably should buy it, if only so I can wear it when I inevitably give a talk on this subject again, either as a stand-alone talk (I've done it twice) or when I teach it in a class.

This is an infinite binary tree of rational numbers. It has 1/1 at the root, and the children of r/s are r/(r+s) and (r+s)/s. It turns out that this tree contains every positive rational number exactly once, giving an explicit enumeration of the rational numbers.

Also -- and this is what's really surprising to me -- let f(n) be the number of ways to write n as a sum of powers of 2, where no power of 2 can be used more than twice. Then (f(0), f(1), f(2), ...) = (1,1,2,1,3,2,3,1,4,3,5,2,5,3,4,...). The sequence of numbers f(n)/f(n+1) are the rational numbers in that tree, in the order which they appear if you traverse each level from left to right.

I learned about the shirt from this mathoverflow thread, where Richard Stanley gives the theorem of the previous paragraph as an example of a theorem with an "unexpected conclusion".

See the original paper by Calkin and Wilf. I've mentioned it before here and here. I think I first heard of this paper from Brent Yorgey's series of posts expanding upon the paper, or perhaps from Mark Dominus' blog. (Somewhat coincidentally, Yorgey and Dominus both live in Philadelphia. I claim this is coincidence, even though Wilf is at Penn, because I don't think either of them heard about the paper from Wilf.)

And can anything nice be said about the number of ways to write n as a sum of powers of 3, where no power of 3 can be used more than three times?

## 09 March 2010

### People round their incomes to the nearest $5,000? Here's something interesting: lots of people, when asked by the US Census Bureau "how much money do you make?", round to the nearest five thousand dollars. See the data tables from the 2006 census. These give the number of people whose personal income is in each interval of the form [2500N, 2500N+2499], for integer N. One sees, for instance, that the number of people making between$27,500 and $29,999 (which is near the mode of the distribution) is less than both those making$25,000 to $27,499 and those making$30,000 to $32,499. Something similar occurs at all income levels -- the number of people making between 2500N and 2500(N+1)-1 dollars is smaller if N is odd (and thus this interval doesn't contain a multiple of 5000) than if N is even (and so it does). Surprisingly, the effect occurs even at very low levels of earnings. If you make$87,714 in a year I can see rounding to $90,000 -- but is the person who makes$7,714 in a year really rounding to $10,000? (I found this while trying to answer a question at Metafilter: How many people in the United States make more than$10,000,000 per year?. I seem to recall reading somewhere that personal income roughly follows a power law in the tails, but can't actually find a reference for this.)

There also seems to be a preference for multiples of $10,000 over multiples of$5,000 that are not multiples of \$10,000. But I have work to do, so I'm not going to do the statistics.

## 03 March 2010

### Turning clocks upside down

Apparently this is a puzzle blog now.

This morning at 9:30 I picked up my watch and turned it upside down. It appeared to read 4:00. The hour hand was actually in the position it would be at at 3:30, of course. But the minute hand was pointing straight up, so the time must be on the hour. Since I could easily tell that the hour hand wasn't pointing directly to the right, I suppose my brain interpreted it as 4:00 instead of 3:00. Of course I did not think any of this consciously, but only reconstructed it after the fact; my thought process was more like "hey, 9:30 upside down looks like 4:00. that's weird."

Is it ever possible to interpret the hands of a clock, turned upside-down (i. e. rotated by 180 degrees), unambiguously as a time? (Fudging like my brain apparently did this morning does not count.)

## 01 March 2010

### A puzzle with lights

Since I seem to be getting this question a lot lately: yes, I'm still alive! But I am writing a dissertation. (And waiting to hear back from places where I applied for jobs, but that is not an excuse because those applications are already out there.)

Here's a puzzle I heard a couple weeks ago. You have a 10-by-10 grid of lights. Some of them are on, some are off. You are allowed to make the following moves:
(a): you pick one of the lights which is the center of some 3-by-3 square (i. e. it is not on the edge of the grid) and switch all the lights in that 3-by-3 square (on becomes off, off becomes on).
(b): like in (a), but with a 5-by-5 square.

Is it possible to get from an arbitrary starting position to the all-off position?

## 12 January 2010

### "Orthogonal" at the Supreme Court

Orin Kerr at the Volokh Conspiracy points to arguments in a U. S. Supreme Court case yesterday which used the word "orthogonal" in the technical-jargon sense defined, say, at the Jargon File. (See page 24 of the original transcript.) There's a follow-up here by Eugene Volokh, basically saying that there's no point in using big words if your audience doesn't understand them. (And the justices did stop to ask what the word meant.)

## 07 January 2010

### How most mathematical proofs are written

From Abstruse Goose: How most mathematical proofs are written, dramatized as people driving around and getting lost.

Sometimes I've wondered what an actual map of the various possible proofs of certain results would look like.

## 02 January 2010

### Magic and mathematics

Sunday's New York Times has a bunch of magic tricks based on simple algebra, by Arthur Benjamin.

For some magic tricks based on "deep" mathematics, check out this mathoverflow thread. Rumor has it that Persi Diaconis thinks there's no such thing, though, and he would know.