Thinking Tester: Why counting is a bad idea

Friday, December 15, 2006

Why counting is a bad idea

Let us consider a typical Test report - a report that is presented in a meeting for assessing the progress of testing, attended by key stakeholders and team members:

No of Test cases prepared: 1230
No of Test cases Executed: 345
No of Test cases Failed : 50
No of Bugs reported: 59
No of Requirements Analyzed : 45
No of requirements updated :50
No of Transactions covered in Performance Testing : 24
No of Use cases Tested : 233

Productivity

No of Test cases prepared Per Person Per hour = 5
No of Test cases executed per person per hour = 15

What do you see here?

Managers love numbers - Numbers give objective information, numbers quantify observations and help in taking decisions (??). Numbers simplify things, one can see trends in numbers.

You might have heard about one or more of above statements (mostly in review, progress meetings right?). When it comes to testing, followers of Factory approach testing, are comfortable in just counting things like test cases, requirements, use cases, Bugs, passed and failed test cases etc and take decisions about "quality" of the product.

Why counting (without qualifications) is bad idea in testing? What are disadvantages of such practice? Let us briefly take a look of few famous frequently *counted* things

Count Requirements (as in there are 350 requirements for this project)
Can we count?
How to count? Do we have a bulleted list of requirements? If not, what to do?
How to translate given requirements into "bulleted list"
How to account for Information loss, interpretation errors while counting requirements
Count Test cases ( as in test team has written (or designed or prepared) 450 test cases in last week)
Test cases are test ideas. Test case is only a vague, incomplete and shallow representation of actual intellectual engagement that happens in testers mind at the time of test execution (Michael Bolton, mentioned this in his recent Rapid software Testing workshop at Hyderabad)
How can we count ideas?
Test cases can be counted in multiple ways - more often than not, in a ways that are "manipulative" - count is likely to be misleading
When used for knowing or assessing Testing progress - likely to mislead the management
Count Bugs ( we have 45 bugs discovered in this cycle of testing so far)
The story or background about a bug is more interesting and valuable than the count of bugs ( this again I owe it to Micheal Bolton - "Tell me story about this sev 1 bug? would be more informative and revealing question than asking how many sev 1 bugs we have uncovered so far?
when tied to testers effectiveness - is likely cause testers to manipulate bug numbers ( as in Tester 1 is great tester as he always logs maximum number of bugs)
Let us face the fact of life in software testing - there are certain things in testing that can not be counted as we count no of cars in the parking lot, no of patients visited a dentist's clinic or No of Students in a school.

Certain artifacts like test cases, requirements and bugs are not countable things and any attempt to count them can only lead to manipulations and ill-informed decisions.

Wait --- Are there any things at all in testing that we can count without loss of effectiveness and usefulness of the information that a counted number might reveal?

Shrini

13 comments:

Anonymous said...: "If you really need a simple number to use to rank your testers, use a random number generator. It is fairer than bug counting, it probably creates less political infighting, and it might be more accurate."

Don’t Use Bug Counts to Measure Testers by Cem Kaner (http://www.kaner.com/pdfs/bugcount.pdf).; 9:59 PM, December 15, 2006
Pradeep Soundararajan said...: Are there any things at all in testing that we can count without loss of effectiveness and usefulness of the information that a counted number might reveal?

Yes, the count of "how many number of fools are counting?".

(Just for humour, could be true too); 1:39 AM, December 17, 2006
Brett Leonard said...: I agree that a creative process such as testing does not lend itself easiliy to developing of metrics. However, if you tell your software development project manager or the marketing director in charge of a project that you will not produce metrics because - it's a bad idea to count - then you will be looking for another job very soon. This is a case were the theoretical does not agree with the practical. The bottom line is that you as a tester should be concerned with the quality of your own testing effort. Testing can be a very lonely discipline but we have a job to do which is two-fold. One is to make sure that you identify as many high priority bugs as possible in the shortest period of time. The second is to live within the political parameters that exist within the organization you are working for. Unfortunately, we need to show numbers to justify our position within the organization - even if it isn't the best idea.; 8:55 AM, December 18, 2006
Anonymous said...: Hey Shrini,

Thanks for this post. I do agree that it's not posssible to count all the activities that we do in testing. But then that doesn't mean like all the Test related metrics are bad.

Some thing like below are very much subjective and won't be accepted in genaral.

Productivity
No of Test cases prepared Per Person Per hour = 5
No of Test cases executed per person per hour = 15

1. What information available for these Test Cases
2. What kind of test cases are they writing ?
3. What kind of Test Design Tech are they following (BVA, EP, DT, OA, CE etc)

There will be so many queries that will be come up here and it's tough to measure some one just based on the number.

If you ask me. how many test cases are there for this requirment, i should be able to give a number. My metrics should tell me the number of Test Cases, Issues etc wrt the requirement.

Otherwise we may not have clarity on what we are doing & what are we trying to deliver.

As a Testers, we should give qualitative information about the application under test. This qualitative information should also include some of the measures on the product quality else it's tough for the stake holders to take decision.; 2:31 PM, December 18, 2006
Anonymous said...: I do agree with what you say, especially if you are using these metrics to measure the effectiveness of a tester. But then I feel that with some of these numbers can be a pointer for our improvement.
Some of the metrics like
- the defect leakage to the customer,
- the severity of the defects a tester is logging
or if we can attribute the defect logged to the a particular stage of the SDLC.
I feel that collection of metrics should be towards improving the process so that we deliver a quality product to the customer.; 1:36 PM, December 19, 2006
Anonymous said...: I do agree with what you say, especially if you are using these metrics to measure the effectiveness of a tester. But then I feel that with some of these numbers can be a pointer for our improvement.
Some of the metrics like
- the defect leakage to the customer,
- the severity of the defects a tester is logging
or if we can attribute the defect logged to the a particular stage of the SDLC.
I feel that collection of metrics should be towards improving the process so that we deliver a quality product to the customer.; 1:38 PM, December 19, 2006
Pradeep Soundararajan said...: @Anonymous,

But then I feel that with some of these numbers can be a pointer for our improvement.

How do you know, people have not manipulated data/metrics?

If you were a manager and you ask your tester "How many test cases pass?" and let's say the the tester says, "95% pass"... would you go by that metric?

What if the test cases are not good enough?

the severity of the defects a tester is logging

This might confuse you further since in many organizations testers are rated against the number of high severity bugs he/she rises.

At least, I have seen many testers, rising even Sev2 issues as Sev1 to get attention from his manager for next appraisal.

BTW, do you personally like being rated by a number 3.4 or 4.4 during the appraisal?

Metrics misguide, since people themselves are misguided to produce metrics.

@ Venkat,

If you ask me. how many test cases are there for this requirment, i should be able to give a number. My metrics should tell me the number of Test Cases, Issues etc wrt the requirement.

Okay, lets say you have a requirement - This field should take only alphabets and nothing else other than that. -

The count of test cases you give is different from me and from Shrini and from anyone in this world, so if you say you have 100 test cases for this requirement and I say 76 and Shrini says 90... that means the metric varies from person to person.

Do you want your decisions to be dependent on a variable?

I still accept that you might have a valid point and if we could learn from you, it will be great!; 1:27 PM, December 21, 2006
Anonymous said...: @ Pradeep,

Thanks for sharing your views.

Let me look at your requirement

" This field should take only alphabets and nothing else other than that "

In the first place, the requirement is not clear

1. What kind of Application are we taking about ?
2. What is the size of this field (Some thing like password. It should have a Min and Max length)
3. When you say Alphabets, Do you mean to support all the alphabets in all the languages or just english ?
4. ????

See Pradeep, my queries will go on like this and all these will help me in getting the clarity on the requirement.

This process helps me in knowing on what is the requirement, it's advantages, it's limitations and most possible scenarios where this can be used.

All the above will help me in capturing the required tests for the above requirement.

But this may not be generic on the whole but then it can be generic to the Organization or at least to the project people are working with.

See there should be some way to measure on what we are doing. This measure should help the stake holders to take a decision on the quality of the application under test.

I do agree on the fact that we need to improve ourselves in coming up with measures that have qualitative information for the stake holders.

This is a continues process. Probably you might start with bad counts and improve the same for the next report.

But on the whole, i don't agree on the fact that Counting is a bad idea

Let me describe my stand clearly wrt to this post.

Why Counting is bad Idea ?
1. What is the objective of this post ?
2. Bad Idea for whom ?
Is this for Customers / Management / Developers / Testers / Others
3. What are we trying to Convey with this post ?
Don't count at all or Improve the process of counting to have qualitative information

@ Shrini,

Thanks for coming up a post that has good debate.; 3:39 PM, December 26, 2006
Anonymous said...: Hi Shrini,

I agree that we cannot count figures and can't rely upon that. But according to the management in the strategy meeting they will come up with timelines and resources based on the budget allocated. Definitely for the clients who is spending a lot for the quality outcome - it will be a concern?

There are certain projects which may either/or Development cum Testing/Consulting projects where the slippage is considered rather than counts. Sometimes it may be because of the requirements (or) Test Case Design (or) Planning problem when creating strategy. How do we address that going forward?

Your comments against this?; 3:26 PM, January 09, 2007
Brandy Galos said...: I'm an ex-test manager and I loved your post. I tried to comment, but what I said grew too long, so I eneded up posting it on my own blog: http://brandygalos.blogspot.com/2007/01/managing-test-manager-or-seeing-big.html

To sum up what I say in my blog:

The purpose of a test team is to communicate the accurate status of the project to the rest of the product & managment team.

If they don't feel like they understand where things are they will start asking for stuff they think will give tham that information. The stuff they come up with looks like what you list at the top of your post.

The best way to get out of this ditch (or never go into it) is to make sure that you give very accurate status reports before they need them and in a way that works for the.

Darn. This is getting long again. This subject caused me so much pain that I can't quite seem to shut up about it.

Thanks for a great post on a subject that I think more people need to understand.; 4:13 AM, January 29, 2007
Cem Kaner said...: I don't disapprove of all counting.

But I want to work only with metrics that have the meaning that I think they have.

The name for this requirement is "Construct Validity" and there are a buhzillion links to it from a Google search (well, 926,000). Unfortunately, very few papers in computer science or software engineering or software testing talk about "construct validity" and so our discussions of metrics look a lot like discussions in other fields before about 1935.

For a brief summary of the problem and of ideas on theory of measurement that developed over the last century in other fields, see Cem Kaner & Walter P. Bond, "Software engineering metrics: What do they measure and how do we know?" 10th International Software Metrics Symposium (Metrics 2004), Chicago, IL, September 14-16, 2004 at http://www.kaner.com/pdfs/metrics2004.pdf

As to counting the number of test cases for a requirement, it is fascinating looking at the number of tests suggested for Jerry Weinberg's triangle problem (the triangle problem described in Glen Myer's book, The Art of Software Testing.

I've seen lists of 4 to 160 different tests published in books as the appropriate set of tests for this. I could agree with Doug Hoffman (http://www.softwarequalitymethods.com/H-Papers.html#maspar ) that we might prefer to test several orders of magnitude more than this.

So what is the right metric for the number of tests?

Glen Myers gave a simple 20-line program that you could test in many trillions of ways. People have dismissed Myers' example as artificial, but I illustrated a life-critical bug from my days programming telephone systems that fits right in with Myers' case. See the lectures at http://www.testingeducation.org/BBST/Intro2.html

Measurement can be useful if you understand:
* what you are trying to measure (the attribute being measured)
* why you are trying to measure it
* how your measurement relates back to the attribute you are trying to measure (construct validity)
* how the act of measurement and the interpretations of the measurements will affect the measured attribute and the other behaviors in the organization being measured (side effects of measurement), and
* how reliable the measurement is
then you might do more good than harm by using the measurement.

Question: How often do you see these questions discussed in discussions of measurement?

Subjectivity of measurement is the more common discussion among engineers and testers, but in practice, this is often much easier to manage than the challenges of construct validity and side-effects.

-- cem kaner; 6:44 AM, February 01, 2007
Anonymous said...: "Counting is a bad idea" if you are counting to rate the effectiveness of a tester. That won't take you anywhere. And you would be left with tons of defect reports and other such documents out of which many would be manipulated ones. So, personally I am strongly against "Counting". I believe, that if I can catch 1 important (critical !) defect in a whole testing session (can be a whole working day !) then its more important (for the quality of the product under test) than to find many silly unimportant defects (may not be defects altogether).

Thanks Shrini for the nice post (thanks too for leaving your comment on my comment in james' site)...; 4:23 PM, February 20, 2007
Jason Feldhaus said...: There is a particular metric in software testing that I like.

It's the number of bugs fixed as a percentage of the number of bugs filed. The idea is that if a tester tends to report good bugs then these bugs are more likely to be fixed. And a good bug is a bug that does get fixed.

So if a tester finds a lot of bugs that he or she thinks are good, but development and management do not take action to fix these bugs, then the tester needs to reevaluate what a good bug means.

The tester can then measure their progress and effectiveness by monitoring the fix rate of the bugs that they submit.; 4:22 AM, June 28, 2008