Thursday, August 11, 2011

Do statistics lie?

There are no whole truths; all truths are half-truths. It is trying to treat them as whole truths that play the devil" – Alfred North Whitehead

I received an email this morning with this topic of statistics and the content of mail and overall theme of the mail boldly proclaimed “statistics do not lie”. When we speak of statistics in software world, we typically refer to metrics and various numbers representing effort, number of defects, test cases, cost etc. So, instead of talking about statistics – lets talk about “Do numbers lie?”

We have two items here – numbers and lie or truth.

Let us start with numbers. What is a number after all? The need of having numbers probably is related (or caused by) need for counting. In Egypt, from about 3000 BC, records survive in which 1 is represented by a vertical line and 10 is shown as ^.
According this historical account, Pegan priests need to calculate the frequency of natural phenomenon. One of the best known examples of this period is the Stonehenge stone circle in Britain, built by the Druids as a kind of celestial observatory in 1,800 BC. For cave men of pre-historic age, counting facilitated sharing food items - probably. The pictures on the caves and archeological finds give us an idea that people in those days counted by drawing lines to indicate “how many”. Counting fewer items, let us say 6 fruits could be fitted in this idea of counting with fingers or counting by drawing lines. But how would you count number of fruits in a tree or number of people in a village? Discovery of place value system allowed counting items in 100’s and 1000’s. Hind-Arabic numbers 0…9 were known since probably 300 BC. Before Hindu-Arabic numbers, people used “roman symbols”. Interestingly enough, Greeks and Romans did not know the idea of “place value”. And human civilization evolved.

Gonitsora – an initiative from few students of Tezpur University India, carried an article on history of counting that said –

“The first motivation for people to create number was the human desire to the manyness of a set of objects. In other words, to know how many duck’s eggs are to be divided amongst family members or even how many days until the tribe reaches the next watering hole, how many days wills it be until the days grow longer and the nights shorter, how many arrow heads do one trade for canoe? Knowing how to determine the manyness of a collection of objects must surely have been a great aid in all areas of human endeavor.”

When we say “9” what does that indicate in a purest and objective sense? Nothing. OK, let us say 9 cars? What does that mean? Extending it, “9 cars parked in front of a house” what does that mean? “9 cars parked in front of house of a celebrity in London”. You might say “might not be any significant or interesting” as it is common for a celebrity in London have that many cars. Now if I say “9 cars parked in front of a house of politician in Delhi” – Something surprising or some planning happening on a political discourse. Now if I say “9 cars parked in front of a poor in a remote village in Somalia” – what does that mean? You would really get interested to know what might be happening in that house, who came in those cars, where did the come from? Did this poor steal those cars and all sorts of questions.

Pause for a while – and think did or do the number 9 revealed any truth in each of these situations? Is the number 9 capable of telling any truth or for that matter lie at all? A number meaning full and relevant by set of object/objects, people, ideas, events that the number points to. Thus it might be totally meaningless and absurd to say “number don’t lie”, as a matter of fact numbers of incapable of telling truth or reality independent of context, observers and recipient of the information.

Let us talk about truth. What is truth – a question that is at the base of all philosophy, science and every root of what we call “knowledge”. For the purpose of this post, let me use this definition (very tentative, provisional) - “truth is a qualifier that we can attach to a piece of information about which a group people do not disagree by and large”. Back to software world – give me an example of truth. One might say “In this month there were 9 sev 1 incidents in production”. Is this a truth? You may point to live incident tracking system and show a list of incidents reported in this week and say “look here is the truth there are 9 live incidents”. Let me apply my “provisional” definition of truth here. Let us call 10 people – few programmers, few business analysts, a project manager, a business unit head, a customer and a sales manager. Let us put these people in 10 separate rooms and show them the list of 9 live incidents and ask them “is this information true?” Let us record each response. What do you think would be those responses? Will all of these agree on the notion of truth of 9 live incidents? I guess – many would say “Yes, I know there are 9 live incidents this week But ……” What follows after but is each person’s view point or story of how they view (defend, attack, frown, shout, feel sad etc) those incidents. How do you extract “truth” from this beautiful, “god-like”, impartial number “9” quantifying live incidents”?

You would soon have a consultant selling a version of “cost of quality” and attach some dollar figures to these 9 incidents and sell a multi year “transformational” deal to reduce the cost associated with these incidents. Should you believe him?

Often when executives say “I need statistics, numbers” – it seems to me that they are really (should be) interested in the stories behind those numbers, they are (should be) least bothered about numbers themselves. Numbers are masks for stories, events and emotions that they represent.

Numbers, statistics – are incapable of telling anything in absence of context, stories, people and their motivations. For now, I can say the issue of whether statistics tell lie (or truth) is settled – they don’t tell anything.

An exercise: When I was preparing this post, a colleague of mine, Joy Chakraborty challenged me and said “company financial results” are objective truths about company’s performance (he did acknowledge Satyam Saga and other irregularities about how company financial results could be manipulated). He simply asked how the numbers in the statement - “Goldman Sach’s reported net earnings for Q2 2011 – of 1.1 billion USD - 77% up over previous year’s same quarter” are not objective truths? What do you say?

Is Pythagoras theorem true? How about Einstein’s General theory of relativity?