Monday, June 24, 2019

There is no such thing called defect/bug in Machine Learning/AI domain

One question that comes up again and again in Testing world today is about role of testing in the domain of applications in Machine learning and Artificial Intelligence. To be precise, many in testing community are curious and some-what confused about what they need to do differently (if at all) and what skills they need to acquire additionally. This is post is an initial attempt to share my thoughts in this direction.

What is an ML Application ?
(Machine Learning is considered to be a branch of Artificial Intelligence, hence Omitting using AI along with ML)
The term "Machine Learning" is not new, it was coined by Arthur Samuel in 1950. Definition given by Arthur was "ability of computers to learn without being explicitly being programmed".  In reality, computers do not learn, but software programs learn - a small difference, if you chose to care. How do programs gain such ability to demonstrate such human-like ability to learn? Any any or every program be made to "learn" like this? What has enabled today's computer's technology enabled such possibility being realized? Answers to these questions take the post beyond the topic about ML, Testing and defects/bugs. In short - I would say ability of computers to store and process large volumes of data at the speed needed at processing transactions - has enabled Machine learning as Arthur Samuel might have envisaged.

What is Machine Learning application then? A program that  uses a set of algorithms processing sets of specially selected and curated data about a problem that program intends to solve. Under the hood, the algorithms "fit" the data to some selected mathematical "function" called as "model" such that the programs logic is data driven not hard coded. When I say hard coded in ML parlance - you will not find explicit chunks of if-else or select-case or do-while depicting rules of logic. The "model" through "fitting", generates the logic that data presented to it shall comply.

What kind of problems ML programs can solve? Largely two categories of problems - prediction and suggestion. A machine learning program can classify a bunch of financial transactions (say credit card) as fraudulent (potentially) or genuine or recognize faces in a picture or auto complete what you are typing in a search box on a web page. 

What does it mean for a program to learn ?
In simple language - learning for a program is to discover parameters of mathematical function that program uses to establish relation between input and output. Let us take an example of classification that aims to predict whether an image contains text or not. In this case the image and its properties (what each pixel tells about the whole picture) are inputs and output is a binary decision whether image contains text or not (1 or 0). For a human eye - it is easy to make the decision where as for computer  - the problem needs to be presented as (an example) a mathematical function like y =f (x). This function will have its parameters that the program needs to compute. For this purpose the program needs to presented with loads of data (input images and decision whether there is text is there or not). By processing this data the program is expected to identify the relation between "y" and "x" which is a mathematical function like y=mx+c (here m and c are parameters of the function).
This process of arriving at parameters of the function by working through data is called as "learning". Once the program learns the relationship - then, it can predict "y" - decision that whether image contains text of not - given any new image that program has not "seen" before.

Needless to say computer (program) does not "see" the image like a human eye - it (program) sees the image as a matrix of numbers that indicate pixel color scale or density. There easy python modules/programs that can convert an image into a matrix of numbers that a learning program can consume.

Also important to note all that data that the program has "seen" or processed during the process of "learning" does not stay with the program. What is left in the program is just the "essence" of data that leads to establishing the relationship y=f(x) in the form of parameters of the function. The data that program uses to "learn" the relationship is called as "Training Data" - how innovative !!!

Coming back to main topic of the post - what does a bug mean in this context ? When a program incorrectly calls an image as containing text when image does not contain text  - do we call that behavior as application bug? ML programmer would probably call  it as "program is learning" or "program needs to see more data to increase its accuracy of prediction". In this way - every opportunity for program is learning, like we say a lawyer or doctor as "practicing" - ML program, probably never "performs" but always in the process of "learning" !!!

What do you say? If program does learning (I have dislike for the term "machine learning" as its not machine that learning - its the program that is learning. Try saying programming learning, or software learning !!! its funny) - what testers need to learn ? What is left for testers to learn if programs become intelligent ?

Wednesday, May 15, 2019

Industrialisation of Testing, Heuristics and Mindfulness

Over last two week end - The Test Tribe (popular testing community) hosted two sessions on facebook - one from T Ashok on Smart QA and other from James Bach on "Testing Heuristics". Both sessions were well received and interestingly I could see some connection between ideas that were part of these two sessions.

Industrialisation of Testing - Up until now - I thought industrialization in testing as bring "factory" metaphor into what we do as testers - intellectual search for problems in products we test. Ashok T in his session took a different position. He says industrialization in testing is about doing less through exploiting work done by fellow testers in the form of tools, test ideas, methods etc. He drew parallel with how software development community though its open source revolution - makes it possible to build application with writing less and less code. He stressed on creating open source revolution in testing so that testers can share their ideas so that we can use, reuse and grow testing repository. That would be true industrialization. There has been such work happening in our community - what we need a platform and such active participation/contribution.

Mindfulness Ashok in his session urged testers on mindfulness - acting with awareness of how we work, why we do what we do. Very nature of the mind is such that it wants wander and then programs in subconscious mind take over - run the what we do without our conscious engagement. Testers through their habits go about their day's business without being consciously aware of decisions, choices they make. Through mindfulness, testers would need to break the autopilot mode and carefully watch every step - this will enhance their skill, productivity and reduce errors they make in their work. Rarely I have seen such an advise to testers  - indeed a point to note.

Heuristics  James Bach in his session on Heuristics - went on in detail to explain how all testing, software development and Engineering is rooted in heuristics - fallible methods to solve problems. Those who follow context driven testing community are well aware of this term. James explained how heuristics need human judgement not mere following the rule -as heuristics can fail. James said in our daily life we use many heuristics without being aware. He urges, from his own training and experience, to be aware and name a heuristic when you use one.

Here is where I am reminded of mindfulness that Ashok suggested to use. By being mindful -we can recognize heuristics we use, when we recognize , we can name them, when we name them - we can share with fellow testers. That leads a community movement which manifests as Testing industrialization. Its exciting to see these two testing guru's ideas are connected in unimaginable ways. 

Sunday, September 16, 2018

Testers don't and can't prevent bugs : Alltruism or Sense of Pride ?

One of the fashion statement associated with testing these days is "testers should focus on preventing bugs rather than finding them". This is a very tricky idea and is full of traps for testers. Recently a post came up in software testing yahoo group that somehow got into this topic of preventing bugs.

Coming from the context driven school of testing and trained by likes of James Bach, Cem Kaner, Michael Bolton and others - I was skeptical about testers preventing bugs. Fundamental idea of our school of testing has been that as testers we bring to the light the information about bugs and risks in the software we test. Then we report it in a way to stakeholders (powers to be) to act on it.

Many testers fall into trap and take upon themselves (may be due to role/corporate hierarchy pressure) to task of preventing bugs. After all - who does not like someone who prevents bugs than someone who simply reports. Borrowing from manufacturing industry - many business leaders in IT and IT enabled business - firmly believe in prevention is better than cure. Who can resist the nobleness of preventing or saving "nine" by stitching in time.

Let us consider following two cases -

Testers prevent bugs in the requirements by asking question about ambiguity in requirements. Requirement bugs might not be counted as bugs by many - they might be termed as unclear requirements. Calling out what is not clear in requirements is one of valuable contribution of testers.

When Pairing with developers - testers prevent bugs as and when bugs are occur. For example tester may shout .. "hey you are missing exception handling code for that exception or hey you got if loop condition incorrect". That is closest you can get in preventing bugs.

In an email conversation with Michael says -

"we do not use a binary model “pass or fail”.  People who do that are setting themselves up for bad testing.  A product—any product—can “pass” a test but still have terrible problems.  A product can “fail” a test, yet  there’s no problem.  (For instance:  the square root of 2 is 1.4142136, right?  Well, it isn’t; the square root of two is not a rational number; it never ends, and certainly lot at the seventh decimal place.  But for many—even most—circumstances, 1.4142136 is good enough; just fine; not a problem."

This has been a great learning -- testers throw light on ambiguity - that does not mean they prevent bugs happening. Similarly,  in pair testing - testers spot the bug in shortest time possible but they did not present it from happening... that is "early bug detection".

Thanks James and Michael - lesson re-affirmed.

Sunday, March 04, 2018

Chief Value Officer vs. Chief Feelings Officer - Perills of Reification !!!

"Yet the danger if reification is all too real. We fall in love with our models, yet we need to be reminded that they are just models of the real world." Lynn Chiu

A good friend of mine Ray Arell in a tweet asks "why not have a CVO - Chief Value Officer". The word "value" always evoked a very strong internal response in me when I saw it being used in a way that Ray used. This word like others in the same league such as "Quality", "Customer Experience" - is notorious or victim of being reified (not rectified). Michael Bolton first introduced me to this word when we were discussing about abstract vs. concrete things. I learned from Michael that reification leads to gross misrepresentation of idea/word and leads to "gamification". It is a thinking fallacy and all intellectuals/thinkers need to be alert about such thing happening.

What is Reification?
In simple terms - reification refers to considering an abstract idea as though it is a concrete, countable, measurable thing. It is about wrongly understanding an idea as a thing. For example counting how many ideas are generated in a brain storming session is an act of reification as idea is not a thing - counting and doing all sorts of maths around it does not make sense. Here we say "idea" is reified as a thing. Other examples include making objects out of subject human experiences like emotions, feelings, values (say family values, social values) etc.

Marxist definition of reification is about "thingification" of social relationships. Among several perspectives and meanings for this term - I would like to use this definition for the purpose of this post - "A fallacy of treating abstraction as though it is a real thing"

Why reification is problematic?
First of all - reification is misinterpretation of reality of nature of what we are dealing with. It's a fallacy, an error in thinking and communicating. Reifying an idea into object is to strip off the subjectivity, mystery and complex richness of the idea. One common outcome of reified communication is both giver and reciever will have two different meanings and interpretation of what is being conveyed.

Consider yet another term "Quality". In software testing world we are all are familiar with this word. There are more than dozens of definitions of this word each fitting to a specific context and it demonstrates how the term quality offers itself to reification. Quality stands a mask for so many desirable attributes of a thing or a service. Instead of adjectives like "fast", robust, flexible, Easy to understand, cheap we can say quality and get away with bothering about all the specificity and correctness of what we want actually. That is power of reification but that is incorrect, manipulative and bad way to communicate.

Similarly with respect to motivation and change management - we often commit reification error. Social constructs such "percentage of work completion" is often regarded as measurements of real objects when it is the best an idea.

In the word of testing - there are famous examples of victims of reification. Requirements, Test cases and bugs. All these are complex ideas being generated as part of our quest to create software from requirements specified in natural language that gets interpreted and implemented into formal computer language. In the word of agile, we have stories that now replace requirements. A development lead announces in the first sprint meeting of a project  "we plan to delivery 18 stories in this sprint".  18 what ? stories. A test lead is asked "how many test cases you team plans to execute in this release"? In another case, during a project postmortem meeting - a comparison is made between number bugs logged in a given release to the number corresponding to previous release to assess the quality of "this" release. 18 stories, 3000 test cases, 270 bugs are examples of how in today's software world we ruthlessly reify abstract ideas and do math with these numbers. The act of reification allows use the numbers that do not have any inherent meaning of their own when context, giver, recipient and time are removed from them. What happens there after is pure game of manipulation.

Reification is thinking error ... it is a fallacy.

Value vs. Feelings 
In today's business world - the word "Value" is more attractive and sexy. We have terms like value stream, value proposition, value added service etc. Behind each of these phrases, hides a very clear objective, object or a concrete thing. It might be some money, timeline commitment, specific characteristic or outcome of a goods or service. It has become fashionable to use the term "value" instead. Why ? Since the meaning of value is subjective and open for interpretation - it allows one to use word value and imply one thing and later for the same value imply something else. In a sense - using value allows one to manipulate the situation to his/her advantage while not being wrong or incorrect about what is being conveyed through this loaded word "value".

In order to understand the full and correct meaning of word value - we require the context and who are we addressing to. By reifying the word value - we strip off that that richness, context and complexity. Then we start using the phrase to indicate multiple, sometimes disconnected and contradicting ideas as we have left behind context and recipient(s).

Instead of having a Chief Value officer, let us have a Chief Feelings Officer who can understand and deal with customer feelings and emotions about a goods or service delivery. Having this role, corporates can truly claim that they care about individual views and feelings about customers than a rolled, convoluted, metricized - measure such as customer experience.

Many still think that giving good customer experience is having a great looking GUI and exciting animation. Real customer experience, in my opinion is about caring for individual experience in their bare essential with all richness of emotion and context.

CFO - Chief Feelings Officer. Anyone ?

Saturday, November 18, 2017

Computer does what programmer asks it do : why there are bugs?

A colleague of mine said something so extraordinary about software bugs that I have never seen anyone talking about software bugs that way.  The discuss was about how current technologies and advances in Big Data, Machine learning and AI have or will change the way we do testing and how these can help testers in testing.  One of the underlying applications of these technologies is two fold approach - one mimic human action (vision, speech, hearing and thinking !!!!) and then make predictions about what will happen next.

When it comes prediction and testing, obvious topic is "defect/bug prediction".  Bugs are hardest things to predict due their very definition and nature.  This colleague of mine said something that captures this sentiment very well - "There are no bugs in a sense that computer (he wanted to say software... these days it has become a fashion to replace the word software to machine at all possible instances) does not malfunction on its own (barring hardware/power failures etc). Computer does what programmer wants it to do or coded it to do. The problem then lies with human programmer's mind (or brain) that gave computer an incorrect instruction."

Where does this takes us to? It follows from my colleague's logic that the problem then lies with programmer's mind that gave computer the "wrong" instruction. Predicting a bug then would mean predicting when a programmer gives wrong instruction. This is a hopeless pursuit as guessing when human mistake is unsolvable puzzle - at the most you have some heuristics.

If we go back to the idea that software bug occurs when programmer gives a wrong instruction to computer. This line of investigation is remarkable -- First of all how to identify an wrong instruction?
It turns out that a wrong instruction cannot be identified using say an algorithm or mathematical approach. An instruction (such as open a file, send a message to an inbox, save a picture) becomes "wrong" not by itself but the context or logic or user need or requirement. This then takes us straight to mechanism using which we specify the context, need or logic. That is the realm of "natural language".

Software bugs happen due to programmer "wrongly" translating a requirement which is in natural language to a world of computer language.  If we were to predict bugs using likes of Machine learning or AI - we need tools to spot this incorrect translation.

Looks promising ... right? The state of the art in Natural Language Processing (NLP) is about how closely computers (software actually....) can understand natural language. There are  stunning applications of NLP already.

When NLP comes close to understanding human language fullest - we move a step forward in the puzzle of spotting incorrect translation of software requirement to a computer instruction. I hope so....

But then nature (human) leaps to next puzzle for computers... limit of human intelligence and vastness of human communication. With brightest of human testers, we often fail to spot bugs in software - how an approximate and "artificial" system that mimics a portion of human capability do better in spotting bugs? An area to ponder .....
BTW - was my colleague right in saying "computer exactly does what programmer has asked it to do" Really ?

Thursday, August 10, 2017

Machine learning and Software testing

Machines are learning - good for them. What about humans? Popular buzz around now is about machine learning and artificial intelligence. Never in the past, I think these terms intelligence and learning - have become so much importance and got prime time media coverage than now. Thanks, ironically to the qualifiers attached to these words - Artificial and Machine. Now days more engineers are investing time in learning how machines learn (what a paradox) and intelligence that is fake... sorry artificial gets more funding and attention. Has value and quality of human intelligence gone down or has human learning stopped ?

One of the common and popular use case or illustration of machine learning is that now a machine (a software program actually) can recognize picture of a cat or an apple, several types of apples and cats without being explicitly coded do that. Whats more ? As this program "sees"more and more apples and cats - it "learns" - gets better at accuracy at identifying objects. That's quick machine learning intro for you.

When someone takes this idea of identification of car/apple by machine and asks "why cannot machine identify a software bug - as this person does in introduction of this video (at 1:09) - a paradigm shift is needed.

Let us face it - what are in common between a program identifying a cat or an apple on the screen to some other program identifying a bug in a software ?

1. A program with its code and machine learning capability- does its job with relatively simple and formally defined model. There would be rules and patterns in the model to assist the identification. Where as when it comes to form, shape and identification marks for a software bug - you will really struggle to define it.A machine learning model that can recognize a software bug needs far deeper and complicated definition of bug.

2. Even if you concede - you have managed to define a model that can recognize a software bug, the real challenge would be identifying it in a real time when software is running.

Identifying a software bug in simple sense would need following
- Mechanism to generate loads of inputs and configurations of systems under test
- Mechanism to operate SUT with these data sets and observe potentially large number of possible software behaviors
- Among possible outcomes - identify the buggy behavior (Oracle problem)

In short - these are hard problems of software testing in the first place. How machine learning can help?

I like what Paul Merrill says at the end of this talk on youtube talk - "Machines are learning. Are we"(testers) ?

Hard Problems in Software Testing (2017) - Part 1

When I set to write the post with this title - I thought it must be first of its kind. It turns out there is a book written on this subject. The authors of the book list down a number of problems of testing and solution in the approach called "Testing as Service". In this post, I approach this topic from a totally different starting point.

Let me reflect on history of computing a bit to set context to software, software testing and the topic of hard problems.  The word computing refers to use of computers to solve or create systems to solve a range of problems in the areas of math, information science and like. Named after 9th century Persian mathematician, Al-Khwarizmi, the term algorithm gives a formal structure to problem solving approach. A step by step procedure or method to solve a problem is referred to as "algorithm". The program (or software) implements an algorithm and solves the problem. The algorithms can be represented in multiple ways through natural language, pseudo-code, programming languages, flow charts and control table etc.

In early 60's and 70's when computers developed as advanced calculators, math and logic enthusiasts pounced on these new creations to see if their long pending problems be solved. Few wanted to solve the problem of finding out if a given number if prime or not while others wanted to solve a shorted route for a traveling salesman. In these implementations - the program would run (in isolation - no network or internet in those days and no auto updates of OS or any other software) with an input set data set and would compute the "Answer" or "Solution".

Modern business software at the core level is built from the algorithms performing computation/information processing. In word processors, web browsers, camera app on mobile phones - you will see a culmination of work of several algorithms working in background. These algorithms solved basic problems like storing, sorting, classifying information.

Another thing that set the computational problems of 70's to that of business software of 90's and early 2000's is - introduction of Natural language (Likes of English) for specifications. The problems that algorithms solved in 70's were represented in formal mathematical notation. With the introduction of Natural language at one end and high level programming languages like COBOL, Fortran, Pascal, C, C++, Java - we created this problem of translating what is specified natural language to computer language. This created a division between those understand business domain (Natural Language) and those understand computer language (Programmers). This is first big problem of software development. By natural consequence, validating that the program did as per what is specified in natural language - also got complicated. Software Testing that branched off from software programming as a distinct activity from early 90's - has been trying to bridge the gap between programmers and business folks.

The field of computer science deals with solving computing problems and algorithms. The hard problems in algorithm world are classified as P or NP problem. Interestingly this classification is based on evaluating if the algorithm produces result (halts as in halting problem) in a polynomial time function of size of the input or not. Those problems where algorithm fails to halt or produce results in a polynomial times are referred as NP problems - Non deterministic Polynomial problems.

Where does software testing stand in this classification of P and NP problems? If an algorithm were to test a computer program - would it halt and produce answer in polynomial time? How would an algorithm approach the problem of testing software ?

Here is an attempt to list down the problems that characterize software testing as NP problem.

Each problem listed here shows an aspect of testing that makes it hard to have have an efficient, less error prone and cost effective solution. These problems are hard as solutions that we see in practice are sub-optimal and need constant refinement.

1. Problem of potentially infinite sets of Inputs
Unlike programs/algorithms of 70's - modern business software receives and processes a large set of variables and equal or more numbers of input values directly sent to the program. Also modern software is not an isolated desktop software running on one computer - but a combination of several stand alone components running on different computers connected together in a network. A software under test by virtue of this arrangement continues to receive multiple implicit inputs that influence outputs the software produces. Then we have the database/sets of data elements that are managed by the software - state of this database also influences the outcomes of software. There are internal (to the software) configurations that  allow software to be configured in many different ways.

The task of generating all or some "important" sets of direct inputs that are fed to the software while running and sets of all indirect inputs (database, network, internal product configs) - is one of the hard problem. 

2. Problem of operating the software (and its dependencies) under test through set of inputs
The largest chunk of time of testing is spent in operating the software once we have configured software under test and its dependencies. A simple and single thread of this "operation" is the part of a larger unit called as "test case" or "test" that additionally involves making observations and inferences about outcomes of the "tests". Given infinitely large number of inputs (direct and indirect) there are equal number of ways of operating the SUT. This is hard problem. How can we run these "tests" in a finite time and resources? Who would run these tests? Human tester?

Then we will have questions about how these tests be specified, in what language and how detailed. We have attempted to use in both natural language (manual test case/script) and software language (Junit class). How to run these tests - we have tried "interfaces" of the SUT for this purpose. Most popular interface - GUI created an industry of test automation tools and the paradigm of "record" and playback". Some geeky programmers used interfaces like web service to execute the tests in an non interactive way. Both of these approaches have met success to a degree but have left lot to be desired.

The task of running tests - operating the software through a large set of inputs/flows is a hard problem that we need to solve, solve well.

3. The problem of Observing direct and indirect outcomes/behaviors
While programs of 70's produced one or more distinct outcomes as solution for a given problem - we in today's world need to world need to observe software behaviors. It is funny that we use term "behavior" to inanimate object like "software".

Like direct and indirect inputs that the software takes while in operation - an important puzzle of software testing is about observing "all possible" outcomes. How do we do that? Again - there is a human way and an automated way. Continuing on the testing task of running tests - you might argue that making observations on outcomes is extension of executing tests. This is true by and large. The challenge is to specify what all to observe and how. An automated test  might say watch this space or this folder or look for this text message and so on. But that is only part of the test. Given a test, SUT shows many different behaviors and Capturing all of them is a hard problem. More than that - how do we know we have in our list all that we need to observe?

4. The problem of identifying correct and incorrect behaviors - problem of test oracles

On the contrary to what we believe, it is often not very clear as which software outcome is correct which one is a bug. To help in deciding, we use a reference or mechanism that can decide the correct behavior. Requirements specifications give first reference to what we should expect from software - in natural language. Given infinite sets of inputs and corresponding outcomes and behaviors - identifying the right and correct behavior requires a very large number of oracles.

More often than not, humans can and do act at live oracles - they use their own experience and some given references can identify correct behaviors. At times - data and captured behaviors or previous versions (assumed to be correct) of the application is used as test oracle.

5. Biggest of all - repeating all above many times, when software changes
Software is soft and when it is changed, many things change that are not expected to be changed. This is referred as regression. In the life of software, several times it needs to be changed, updated and new features and capabilities to be included - when such change happens, it is not enough to test and validate the changed areas/features - often we need to confirm that changes made did not break other working parts of the software. This means a continued effort and work testing software completely (almost) at all times when there is a change. To make matters worse, you need to do so called "regression testing" even when any external software (external to SUT) is changed. This is biggest problem we need to solve in testing - the burden continuous testing of entire application and its dependencies.

6. Problem of defining and quantifying value of Testing
Testing has no direct value for customer of end user who is interested in how and what features the product offers. Customer assumes that the delivered features work as expected. The value testing in the performance of the product in the hands of the customer is roped into the larger work by the team - mainly development team. The indirect nature of contribution of testing to overall product makes it hard for testing to assert itself and ask for due share in the success/failure of the product.

Our field is about half centuries old now. How would we approach these problems of testing software if we were to start all over today?

To be continued .... in part 2

  • Problem of quantification how much testing needs to be done and how much is done
  • Problem of estimation of testing required to be done given a scope
  • Problem of Skill/ mindset
  • Problem of expectations from Testing

Thursday, August 03, 2017

Testing Maturity - Dealing with grown up Kid

Several years ago, during my days as Software testing consultant (not a doer but a consultant) – one idea that repeatedly came up was “Testing Maturity”. Thanks likes of CMM, CMMI, TMM, TMMI, Six Sigma, TQM and others – IT world was (mostly “is” as well) obsessed with knowing what it is means to be a “mature” about just anything. Testing – being one of the most talked about maturity target.

I still remember of my first experience of with testing maturity models – when searched on internet, I did not find much “state of the art” stuff (about 10-12 years back). Then like many others – I set out to create my own “framework” for assessing testing maturity. Looking back – I see my attempt as very “immature”. It pretty much looked like any other similar framework, it had levels of maturity, key focus areas and some kind of recipes to move from level 1 to level x and so on. My bosses then liked it. It made some buzz with clients that I worked with. Now I wonder why created those things. I thought then, there must a model using which a testing group can be called mature or immature. The word mature was equated to "Good",  "Efficient", "Desirable" etc. I understood now that maturity is not about good or bad - its about ability to sustain and adapt with change. No model I know of and the ones I created took this approach to maturity.

Another way to look at maturity is how we deal with people. When we say about someone that he or she is mature - it means that person can deal with adversity better, can behave/react with patience and so on. We should apply same idea to software testing. 

Recently a friend of mine bought this idea and rekindled my thinking. Hence I am writing this post.
Most valuable suggestion when I was working my testing maturity model came from my mentor Michael Bolton – who suggested a remarkable thing about the idea of “maturity” (in general). I am going to expand on my renewed model of testing maturity on this interpretation of maturity. Michael suggested that one of the useful ways to define maturity to software (and testing) is to draw parallels with the idea of maturity in biological sciences. Charles Darwin in his theory of evolution – defines maturity as ability of species to tolerate and adapt to the changing surroundings. We all are familiar with tag line of Darwinian theory “survival of the fittest”.
So – my definition of testing maturity draws from this biological sciences idea – testing is considered as mature if it successfully adapts generations of changes happenings in its environment (business and market environment) and retains its relevance/importance. How do you identify such testing practice? Stakeholders are willing to pay for it (challenge me – if you find this statement problematic)
Let us now look at deeper. I think the idea of testing maturity can be applied to a specific “Testing team” (a group of people operating under a corporate structure) or a function or task that needs to be done as part of software making (simple term than saying SDLC that takes me to many other detours that I would like to avoid now). The software Services industry, System integrators, Big consulting companies would like to apply this term to “Testing Practice”. Though the term testing practice sounds very professional (likes of Gartner, Forrester would love) and appear to include both team and function – on the ground – it mainly implies team, structure and some rule book. In most of the cases, software testing maturity is applied to “independent” testing groups – needless to these groups want a label of “mature” so that they continue to live and get funding. Also note that aspects of maturity as it applies to team/structure and to testing as function are not mutually exclusive – there are some common elements.  One reason that I want to make this distinction is that many aspects of maturity take a different shape if I look at testing as group or structure rather than testing as something that a specific team does. You know where I am hinting to. Yes – Agile and DevOps world of software making.

Testing maturity as applied to team/structure
I look at Testing team maturity in terms of Leadership, Doers and testing culture.
A mature Testing leadership would ensure that testing team is responding the change in the ecosystem in which it operates and adapting itself to survive and succeed. A mature testing leadership brings about changes in the team as required and develop collaborative partnerships with developers, project managers, production support teams and stakeholders. A mature testing leadership would not hold its principles and policies as something cast in stone. A real test of maturity of testing leadership is when stakeholder question very existence of testing as a service that a given team can provide. Most of independent testing team have faced this test. A mature testing leadership would be more than willing to break the corporate structure of test team and will be ready to mixed or morphed into any other emerging structure of the organization – an act of self-sacrifice.  Call your testing leadership as mature if it can dissolve itself (the team structure mainly) for the larger interest of testing as function.
Let us now come to “Doers” – I deliberately use this term to indicate group people who do testing rather than the ones who “manage” or “coordinate” testing. Mature testers (doers) focus on constant learning and do not identify themselves with any specific domain, technology or tools or process or like. Mature testers understand the value of adaptation to changing ecosystem and work on acquiring skills to remain relevant in emerging situation. A mature tester thus can operate as effectively in any circumstances and be useful towards the goal that the broader team is pursuing.

A combination of mature testing leadership and mature tester gives an ability of “quick” yet thoughtful response to “change”.  James Bach characterize an expert tester (sorry If just moved from a mature tester to an expert tester – stay on. I hope to establish a connection) as someone who can test under any circumstance of time and other resources.  This ability to test “well” under any circumstances is what gives tester and testing leadership a crucial edge and ability to survive. Isn’t, thus a key aspect of maturity?
Finally – the culture. This is something that mature leadership and mature testers together demonstrate when they are in action. A mature testing culture does not whine about changes but strives to change itself to adapt. A mature testing culture manifests itself in terms of beliefs, collective thinking and set of written or unwritten rules about how testing should be conducted. On any question related to any tactical or strategic aspect of testing – testing culture helps testers (and leads) with “default” response. If watch a team of testers in action – you can distinctly notice the “culture” – if you cannot then probably the culture has not set in yet.
As testing as function continues to evolve and becomes something that needs to get done as part of software delivery – it would be appropriate to turn focus to “mature tester” – an individual. Here too, my definition of maturity is on the lines of “one who can continuously adapt to changes in the environment and evolve”.  Are you a mature tester ?