Thursday, 19 March 2009

Boy Girl Paradox

I have always found it interesting how a badly worded question can lead many highly educated people to vastly different answers. A recent posting on Coding Horror by its author Jeff Atwood has gained more comment posts than any other article: http://www.codinghorror.com/blog/archives/001204.html

The question was:"Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?"

So - what's the Answer?

Simple answer is that the question is too ambiguous to be definative.

He is intending to say “What is likelihood of a parent with two children having mixed sex siblings, where the children aren’t two boys?”

Answer: 2/3

My interpretation of his current question is: “What is the likelihood of a parent with two children, one or more of which is a girl has the other child as a boy?”

Answer 1/2

That is – in the first one we are focused on the probabilities of parents that don’t have all boy siblings while in the second one we are focused on the probability that a girl has a brother (parent focus versus child focus).

What’s the difference? In the first case, 50% of all parents with 2 children have mixed sex siblings, while 50% have either all boys or all girls. Remove parents with all boys and you are left with 66.67% of parents have mixed sex siblings and 33.33% have all girls. In the second case, 50% of all girls have a brother and 50% have a sister.

The question has been roundly condemned as being ambiguous at best and seriously erroneous at worst. In fact, many people have provided links dating back well before this post with very similar questions and deconstruct why they are misleading and wrong, and how important the correct language is.For example, some have argued that saying “and ONE of them is a girl” means that the other MUST be a boy. Nowhere does it say “one or more”, and if you were to take the logic that “one” actually does mean “one or more” then you would similarly have to assume that saying they had “two children” must also mean they had “two or more children”.The debate went on and on, backwards and forwards. Eventually most people realised that the debate was almost entirely over interpretations.

Never have I seen an example that so clearly demonstrated how important clear language is. But it isn't all Jeff's fault - he tends to be somewhat of a plagiariser and word has it that the question is straight out of the book he was reading. A plagiarised question doesn't make it correct - so the blame must bubble up back to the authors of the book. However - shouldn't Jeff also be held to account for the plagiarism (let alone copyright violations of using whatever images take his fancy). Whatever the case - that's another topic all together.

Further details of the "Boy Girl Paradox"can be found here:

http://en.wikipedia.org/wiki/Boy_or_Girl

What is random?

Randomness is very rarely (if ever) actually found. This was one of the most perplexing parts of statistics for me – that almost nothing is random. Randomness is normally defined as each item in a random set has no relation, either correlation or formulaic, with any other value generated.

Almost all statistical tests that require randomness get buy with pseudo-random number generators, like Mathcad.

Pseudo-random because a computer can't guess/invent anything. It has to measure things. So normally it measures the time and/or time since CPU was turned on and applies some type of hashing algorithm. Therefore - not actually random. I like this quote:

“Computers are typically very bad at being random because they are designed to be able to reliably calculate the same answer, if given the same data to work with. When computers don't behave this way they are considered broken and in need of repair or replacement. Keys generated by a pure software process on your typically predictable computer will always, at some level, be predictable.”

But sometimes (rarely) you need more randomness. Cryptography is where I came across this. True randomness only occurs at the sub-atomic level. Luckily we don't need to measure that low because sub-atomic randomness "bubbles up" to affect molecules and their interactions (butterfly effect). We call anything governed by randomness “chaotic”. There aren’t too many things that are truly chaotic.

What I mean is: when one atom bounces off another the bounce angle is not directly proportional to the mass of both objects and their collision angle. That is – they are not rubber balls that follow a known path and will bounce off each other in a predictable manner. Instead the configuration, location, shape and speed of electrons and other sub-atomic particles add a degree of "unknown" - true randomness. In fact, if you measured these location, shape (orbital shape), and speed of all sub-atomic particles, you would actually change them. Therefore, you can never predict their activity because that requires measuremnt which affects the object making the prediction invalid.

The “butterfly effect” of atom collisions leading to randomness is easily seen with Lava lamps. The patterns they form are not unique, but the size, speed, direction and even the point at which wax globules join are all random.

A lava lamp is not only easy to look at, it’s also easy to monitor and use the image to generate random numbers.

This site has all the details:

http://www.lavarnd.org/faq/true_random_src.html

It has facilities for you to generate numbers. I've used this site/method again and again...

But things are a changing! Recently I found out that Intel has implemented a hardware random number generator in some of its chipsets. IIRC - it is based on measuring thermal noise.

Ha - that's fantastic. Everything old is new again. The Commodore 64 had a sound chip that provided the same feature - it could measure white-noise (thermal or audio noise are both the result of atom movement). Some in the crypto world actually networked to a Comadore 64 as a true random number generator source.

Sorry if this post was a bit random...