Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another statistically unintuitive problem (which I've witnessed a lecture hall enter a state of uproar over):

There are 2 red and 2 blue balls in a box. One ball is removed at random, what are the odds that the ball is blue?

Now we repeat the problem, but before examining the ball, we remove a second ball. We observe that the second ball is blue. In this case, what are the odds that the first ball is blue?



1/3 - If the first ball was blue, the probability to pick the second one is 1/3, while if it was red, the probability to pick a blue ball as second ball is 2/3. The prior probability for each scenario (first blue ball vs first red ball) is 1/2, so the posterior is 1/3 that the first ball was blue.

A neat trick to reason about those cases is to use odds. The prior odds for the first ball being blue vs red is 1:1. The odds for the first ball being blue vs red, given the second ball is blue is 1:2. We can just multiple the odds to get (1*1):(1*2) = 1:2 as the posterior odds.

This doesn't seem impressive because the prior is 1:1, but using this method you can easily calculate the odds in the scenario where there are 4 red and 2 blue balls. The prior odds is 2:4=1:2, the conditional odds is (1/5):(2/5)=1:2, 1:2 * 1:2 = 1:4, i.e. 1/5 chance that the first ball was blue.


I might be wrong here, but this is my understanding.

You'll always be able to show a blue ball as the second ball after the first is drawn. So arguing that it tells you something about the statistics of the state of the system after the first has been drawn is wrong. The chance that the first draw is blue is 50/50. The arguments that lead to 1/3 are trying to use the second event as a statistic for the state, however for that to be appropriate the problem would have to be phrased as:

You draw one ball and set it aside, then draw another, if the second ball is red, you start the entire thing over, if it's blue, then you continue the experiment.

Because otherwise the assumption that the second drawing can tell you the statistics of the underlying state is wrong.

The question of the odds really boils down to, did we randomly pick a ball and observe it? And if so, what would have happened if we didn't observe what we specifically stated for this instance.


Love it. The answer is 1/2. The first event occurs with a population of 2,2 and the second event has no causal effect. The trap is that the observation doesn’t change the sample population of the first event, exactly like the Monty Hall case: opening doors itself does not change the probability to 1/2 as people think it does (it remains 1/3) — it’s the switch that changes the odds.

Similar question with a similar trap: there are new neighbors moving in next door, and you know they have two kids. You see a boy in their yard, so you know they have at least one boy. What’s the probability they have two boys?


Take a look at the other replies. I like paxys' for the simplicity.

The second ball being selected doesn't change the first event, but it does change our understanding of it.

An extreme version: There's a bowl with 3 red and 1 blue balls. We remove two balls again, and the second one is blue. What are the odds that the first one is blue?

Your two kids problem is actually pretty complex, in the form you phrased it. Wiki has a decent explanation (I contend that your question is equivalent to the second question in the wiki article): https://en.wikipedia.org/wiki/Boy_or_Girl_paradox


I’m aware that it’s complicated. I intentionally phrased it in a way that makes it clear that the family moving in is not somehow selected from the set of families with at least one boy, but rather the observation is unrelated.

This is the distinction. I believe that the 1/3 analysis may also be incorrect for the way you phrased your question. If you had said: “we select a second ball, and only observe the first ball if the second is blue — what are the chances of it being blue?”, then the second observation controls the population. Otherwise it’s semantics over exactly what probability we are trying to define?

I think the issue with these questions is that we are asking about “probabilities” which only make sense with repeated iterations. So you always have ambiguity in the construction of the question and interpretation when asking about things that are a “one time” event like both these examples.


I don't think this is correct. It matters whether the person removing the second ball can see the colors and choose accordingly. If you can, and you willingly remove a single blue ball, you have the Monty hall problem where you stick to your choice. It does not influence the chances, it is still 1/2. But the way I read the problem, you choose a ball at random, look at its color, it happens to be blue. Now this gives you information on the composition of the remaining balls. The chance is 1/3. To drive home the point that removing a ball "can influence" the odds of the first ball: what if you took out an extra ball that turns out to be blue? You now have two blue balls, leaving the chance the first ball was blue zero.


The sneaky part of this question is that it gives information about the outcome and then asks you again about the probability. Forget the extra balls, and just drive the point home directly: suppose I pick a ball at random, and show it to you. It’s blue. I ask: “what are the odds that the ball is blue?” (exact phrasing as original question).

There are obviously correct interpretations for 50% and 100%. It depends whether I’m asking: - What’s the probability of this outcome? - What’s the probability that the ball I’m holding in my hand in blue?

The second is effectively a “resampling” with a population of one. You are simply assuming the second interpretation and arguing for it, but I don’t dispute the logic. The original question is unclear whether it’s asking for the probability that you picked a blue ball initially (50%) or the likelihood that the ball is blue, given some information of the outcome. But we don’t normally speak of probabilities this way. The odds that you picked a blue ball initially were 50%, even if you picked a red one.

By giving only partial information, the question creates more ambiguity since the answer isn’t definite. (When there is ambiguity in a question, I believe most people will discard trivial interpretations over substantive ones, which is what pushes toward the “resample” here.)

Anyway, I’ll leave it there since I think it’s clear there are correct interpretations for both, depending on what the question is actually asking.


This is turns out to be rather straightforward once you realize that this is perfectly symmetric: it's just a matter of in which order you are revealing the colors. So this problem is exactly the same as looking at the first (second) ball, seeing that it's blue, then drawing a (revealing the already drawn) second ball. So there are two reds and one blue 'left'. So 1/3. (I am 95% sure...)


Probability of draws:

B then R -> 1/3

B then B -> 1/6

R then B -> 1/3

R then R -> 1/6

Removing the cases where the second draw was R:

B then B -> 1/3

R then B -> 2/3

Therefore the first ball being blue has a 1/3 chance.


Yeah, 1/3. Once I labeled them as R1 R2 B1 B2 and just listed all possibilities, it became clear where my error was.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: