Thursday, August 11, 2011

Do statistics lie?

There are no whole truths; all truths are half-truths. It is trying to treat them as whole truths that play the devil" – Alfred North Whitehead

I received an email this morning with this topic of statistics and the content of mail and overall theme of the mail boldly proclaimed “statistics do not lie”. When we speak of statistics in software world, we typically refer to metrics and various numbers representing effort, number of defects, test cases, cost etc. So, instead of talking about statistics – lets talk about “Do numbers lie?”

We have two items here – numbers and lie or truth.

Let us start with numbers. What is a number after all? The need of having numbers probably is related (or caused by) need for counting. In Egypt, from about 3000 BC, records survive in which 1 is represented by a vertical line and 10 is shown as ^.
According this historical account, Pegan priests need to calculate the frequency of natural phenomenon. One of the best known examples of this period is the Stonehenge stone circle in Britain, built by the Druids as a kind of celestial observatory in 1,800 BC. For cave men of pre-historic age, counting facilitated sharing food items - probably. The pictures on the caves and archeological finds give us an idea that people in those days counted by drawing lines to indicate “how many”. Counting fewer items, let us say 6 fruits could be fitted in this idea of counting with fingers or counting by drawing lines. But how would you count number of fruits in a tree or number of people in a village? Discovery of place value system allowed counting items in 100’s and 1000’s. Hind-Arabic numbers 0…9 were known since probably 300 BC. Before Hindu-Arabic numbers, people used “roman symbols”. Interestingly enough, Greeks and Romans did not know the idea of “place value”. And human civilization evolved.

Gonitsora – an initiative from few students of Tezpur University India, carried an article on history of counting that said –

“The first motivation for people to create number was the human desire to the manyness of a set of objects. In other words, to know how many duck’s eggs are to be divided amongst family members or even how many days until the tribe reaches the next watering hole, how many days wills it be until the days grow longer and the nights shorter, how many arrow heads do one trade for canoe? Knowing how to determine the manyness of a collection of objects must surely have been a great aid in all areas of human endeavor.”

When we say “9” what does that indicate in a purest and objective sense? Nothing. OK, let us say 9 cars? What does that mean? Extending it, “9 cars parked in front of a house” what does that mean? “9 cars parked in front of house of a celebrity in London”. You might say “might not be any significant or interesting” as it is common for a celebrity in London have that many cars. Now if I say “9 cars parked in front of a house of politician in Delhi” – Something surprising or some planning happening on a political discourse. Now if I say “9 cars parked in front of a poor in a remote village in Somalia” – what does that mean? You would really get interested to know what might be happening in that house, who came in those cars, where did the come from? Did this poor steal those cars and all sorts of questions.

Pause for a while – and think did or do the number 9 revealed any truth in each of these situations? Is the number 9 capable of telling any truth or for that matter lie at all? A number meaning full and relevant by set of object/objects, people, ideas, events that the number points to. Thus it might be totally meaningless and absurd to say “number don’t lie”, as a matter of fact numbers of incapable of telling truth or reality independent of context, observers and recipient of the information.

Let us talk about truth. What is truth – a question that is at the base of all philosophy, science and every root of what we call “knowledge”. For the purpose of this post, let me use this definition (very tentative, provisional) - “truth is a qualifier that we can attach to a piece of information about which a group people do not disagree by and large”. Back to software world – give me an example of truth. One might say “In this month there were 9 sev 1 incidents in production”. Is this a truth? You may point to live incident tracking system and show a list of incidents reported in this week and say “look here is the truth there are 9 live incidents”. Let me apply my “provisional” definition of truth here. Let us call 10 people – few programmers, few business analysts, a project manager, a business unit head, a customer and a sales manager. Let us put these people in 10 separate rooms and show them the list of 9 live incidents and ask them “is this information true?” Let us record each response. What do you think would be those responses? Will all of these agree on the notion of truth of 9 live incidents? I guess – many would say “Yes, I know there are 9 live incidents this week But ……” What follows after but is each person’s view point or story of how they view (defend, attack, frown, shout, feel sad etc) those incidents. How do you extract “truth” from this beautiful, “god-like”, impartial number “9” quantifying live incidents”?

You would soon have a consultant selling a version of “cost of quality” and attach some dollar figures to these 9 incidents and sell a multi year “transformational” deal to reduce the cost associated with these incidents. Should you believe him?

Often when executives say “I need statistics, numbers” – it seems to me that they are really (should be) interested in the stories behind those numbers, they are (should be) least bothered about numbers themselves. Numbers are masks for stories, events and emotions that they represent.

Numbers, statistics – are incapable of telling anything in absence of context, stories, people and their motivations. For now, I can say the issue of whether statistics tell lie (or truth) is settled – they don’t tell anything.

An exercise: When I was preparing this post, a colleague of mine, Joy Chakraborty challenged me and said “company financial results” are objective truths about company’s performance (he did acknowledge Satyam Saga and other irregularities about how company financial results could be manipulated). He simply asked how the numbers in the statement - “Goldman Sach’s reported net earnings for Q2 2011 – of 1.1 billion USD - 77% up over previous year’s same quarter” are not objective truths? What do you say?

Is Pythagoras theorem true? How about Einstein’s General theory of relativity?


Rikard Edgren said...

When I started reading this excellent post, I thought to myself:
- It is the wrong question, the right question is "Can numbers communicate what's important?"

But then that exact message was written, and it's as true as Kierkegaard's "the truth is the subjectivity".

The follow-up question is:
- How do we summarize the stories and details so the most important information can be communicated fast?

Ajay Balamurugadas said...

So can we say that numbers are mere containers for the story? As numbers alone do not mean anything, should we judge them only by the stories associated with them?

~ Ajay Balamurugadas

Shrini Kulkarni said...

>>> It is the wrong question, the right question is "Can numbers communicate what's important

Thanks Rikard for posting. Yes... I think the assertion that statistics lie or tell truth is a an inappropriate statement and is more of a management talk. Metrics enthusiasts and process consultants - tend to start of by saying "look - these numbers don't tell lie - there has to be an objective way of measuring this or that (typically related to quality, cost or effort of like)

>>> Kierkegaard's "the truth is the subjectivity".
I agree. Managers, executives all hate "subjectivity" as it puts onus of "figuring out things" on them on the subject matter and on the people who are required to deliver. They pretend as though they are fine in chasing illusive objectivity and degrade subjective "feeling" subjects. To them, statistics help them to look at the stuff removed from the subjects and their "human" burden

>>> - How do we summarize the stories and details so the most important information can be communicated fast?

This is a profound question and needs more thinking and experimentation to arrive at workable solution in a context. To start with - we must acknowledge that "truth is subjective" and make our stakeholders agree that.


Shrini Kulkarni said...

>>> So can we say that numbers are mere containers for the story?

Ajay - numbers are part of the story. The problem is we are often forced to give-up on the story and part and stay "bare" with numbers. Many managers and executives feel that by stories add subjectivity and they want to stay objective by sticking only numbers. Funny thing is eventually all numbers are communicated eventually with some story.

>>> As numbers alone do not mean anything, should we judge them only by the stories associated with them?

Yes. let us not talk about judging - numbers always come with stories whether you like it or not. We often fool ourselves by denying that there are only numbers and no stories. People have always interpreted the numbers at the backdrop some stories that they would like to believe.

So it is not about what those numbers are but it is about "how do people FEEL" about them (a paraphrase from Michael Bolton)


Tal.E said...

numbers without context should never be considered as hard facts! this reminds me of an example I kept hearing in statistics courses:
"a correlation was found between shoe size and math ability".
this would sound less preposterous if I mentioned it was made among elementary schoolchildren. the older the children (the bigger the average shoe-size) the more advanced they are in math...

Shrini Kulkarni said...

@Tal E,

Thanks for posting. I agree with you correlation is not same thing as causation. Shoe size and math ability might be shown to be correlated *some* model and that does not mean one is caused by the other vice-versa.

Asking question - do statistics lie - leads to some strange answers


outsource software testing said...

You got a really useful blog I have been here reading for about an hour. I am a newbie and your success is very much an inspiration for me.