When I set to write the post with this title - I thought it must be first of its kind. It turns out there is a book written on this subject. The authors of the book list down a number of problems of testing and solution in the approach called "Testing as Service". In this post, I approach this topic from a totally different starting point.
Let me reflect on history of computing a bit to set context to software, software testing and the topic of hard problems. The word computing refers to use of computers to solve or create systems to solve a range of problems in the areas of math, information science and like. Named after 9th century Persian mathematician, Al-Khwarizmi, the term algorithm gives a formal structure to problem solving approach. A step by step procedure or method to solve a problem is referred to as "algorithm". The program (or software) implements an algorithm and solves the problem. The algorithms can be represented in multiple ways through natural language, pseudo-code, programming languages, flow charts and control table etc.
In early 60's and 70's when computers developed as advanced calculators, math and logic enthusiasts pounced on these new creations to see if their long pending problems be solved. Few wanted to solve the problem of finding out if a given number if prime or not while others wanted to solve a shorted route for a traveling salesman. In these implementations - the program would run (in isolation - no network or internet in those days and no auto updates of OS or any other software) with an input set data set and would compute the "Answer" or "Solution".
Modern business software at the core level is built from the algorithms performing computation/information processing. In word processors, web browsers, camera app on mobile phones - you will see a culmination of work of several algorithms working in background. These algorithms solved basic problems like storing, sorting, classifying information.
Another thing that set the computational problems of 70's to that of business software of 90's and early 2000's is - introduction of Natural language (Likes of English) for specifications. The problems that algorithms solved in 70's were represented in formal mathematical notation. With the introduction of Natural language at one end and high level programming languages like COBOL, Fortran, Pascal, C, C++, Java - we created this problem of translating what is specified natural language to computer language. This created a division between those understand business domain (Natural Language) and those understand computer language (Programmers). This is first big problem of software development. By natural consequence, validating that the program did as per what is specified in natural language - also got complicated. Software Testing that branched off from software programming as a distinct activity from early 90's - has been trying to bridge the gap between programmers and business folks.
The field of computer science deals with solving computing problems and algorithms. The hard problems in algorithm world are classified as P or NP problem. Interestingly this classification is based on evaluating if the algorithm produces result (halts as in halting problem) in a polynomial time function of size of the input or not. Those problems where algorithm fails to halt or produce results in a polynomial times are referred as NP problems - Non deterministic Polynomial problems.
Where does software testing stand in this classification of P and NP problems? If an algorithm were to test a computer program - would it halt and produce answer in polynomial time? How would an algorithm approach the problem of testing software ?
Here is an attempt to list down the problems that characterize software testing as NP problem.
Each problem listed here shows an aspect of testing that makes it hard to have have an efficient, less error prone and cost effective solution. These problems are hard as solutions that we see in practice are sub-optimal and need constant refinement.
1. Problem of potentially infinite sets of Inputs
Unlike programs/algorithms of 70's - modern business software receives and processes a large set of variables and equal or more numbers of input values directly sent to the program. Also modern software is not an isolated desktop software running on one computer - but a combination of several stand alone components running on different computers connected together in a network. A software under test by virtue of this arrangement continues to receive multiple implicit inputs that influence outputs the software produces. Then we have the database/sets of data elements that are managed by the software - state of this database also influences the outcomes of software. There are internal (to the software) configurations that allow software to be configured in many different ways.
The task of generating all or some "important" sets of direct inputs that are fed to the software while running and sets of all indirect inputs (database, network, internal product configs) - is one of the hard problem.
2. Problem of operating the software (and its dependencies) under test through set of inputs
The largest chunk of time of testing is spent in operating the software once we have configured software under test and its dependencies. A simple and single thread of this "operation" is the part of a larger unit called as "test case" or "test" that additionally involves making observations and inferences about outcomes of the "tests". Given infinitely large number of inputs (direct and indirect) there are equal number of ways of operating the SUT. This is hard problem. How can we run these "tests" in a finite time and resources? Who would run these tests? Human tester?
Then we will have questions about how these tests be specified, in what language and how detailed. We have attempted to use in both natural language (manual test case/script) and software language (Junit class). How to run these tests - we have tried "interfaces" of the SUT for this purpose. Most popular interface - GUI created an industry of test automation tools and the paradigm of "record" and playback". Some geeky programmers used interfaces like web service to execute the tests in an non interactive way. Both of these approaches have met success to a degree but have left lot to be desired.
The task of running tests - operating the software through a large set of inputs/flows is a hard problem that we need to solve, solve well.
3. The problem of Observing direct and indirect outcomes/behaviors
While programs of 70's produced one or more distinct outcomes as solution for a given problem - we in today's world need to world need to observe software behaviors. It is funny that we use term "behavior" to inanimate object like "software".
Like direct and indirect inputs that the software takes while in operation - an important puzzle of software testing is about observing "all possible" outcomes. How do we do that? Again - there is a human way and an automated way. Continuing on the testing task of running tests - you might argue that making observations on outcomes is extension of executing tests. This is true by and large. The challenge is to specify what all to observe and how. An automated test might say watch this space or this folder or look for this text message and so on. But that is only part of the test. Given a test, SUT shows many different behaviors and Capturing all of them is a hard problem. More than that - how do we know we have in our list all that we need to observe?
4. The problem of identifying correct and incorrect behaviors - problem of test oracles
On the contrary to what we believe, it is often not very clear as which software outcome is correct which one is a bug. To help in deciding, we use a reference or mechanism that can decide the correct behavior. Requirements specifications give first reference to what we should expect from software - in natural language. Given infinite sets of inputs and corresponding outcomes and behaviors - identifying the right and correct behavior requires a very large number of oracles.
More often than not, humans can and do act at live oracles - they use their own experience and some given references can identify correct behaviors. At times - data and captured behaviors or previous versions (assumed to be correct) of the application is used as test oracle.
5. Biggest of all - repeating all above many times, when software changes
Software is soft and when it is changed, many things change that are not expected to be changed. This is referred as regression. In the life of software, several times it needs to be changed, updated and new features and capabilities to be included - when such change happens, it is not enough to test and validate the changed areas/features - often we need to confirm that changes made did not break other working parts of the software. This means a continued effort and work testing software completely (almost) at all times when there is a change. To make matters worse, you need to do so called "regression testing" even when any external software (external to SUT) is changed. This is biggest problem we need to solve in testing - the burden continuous testing of entire application and its dependencies.
6. Problem of defining and quantifying value of Testing
Testing has no direct value for customer of end user who is interested in how and what features the product offers. Customer assumes that the delivered features work as expected. The value testing in the performance of the product in the hands of the customer is roped into the larger work by the team - mainly development team. The indirect nature of contribution of testing to overall product makes it hard for testing to assert itself and ask for due share in the success/failure of the product.
Our field is about half centuries old now. How would we approach these problems of testing software if we were to start all over today?
To be continued .... in part 2
Let me reflect on history of computing a bit to set context to software, software testing and the topic of hard problems. The word computing refers to use of computers to solve or create systems to solve a range of problems in the areas of math, information science and like. Named after 9th century Persian mathematician, Al-Khwarizmi, the term algorithm gives a formal structure to problem solving approach. A step by step procedure or method to solve a problem is referred to as "algorithm". The program (or software) implements an algorithm and solves the problem. The algorithms can be represented in multiple ways through natural language, pseudo-code, programming languages, flow charts and control table etc.
In early 60's and 70's when computers developed as advanced calculators, math and logic enthusiasts pounced on these new creations to see if their long pending problems be solved. Few wanted to solve the problem of finding out if a given number if prime or not while others wanted to solve a shorted route for a traveling salesman. In these implementations - the program would run (in isolation - no network or internet in those days and no auto updates of OS or any other software) with an input set data set and would compute the "Answer" or "Solution".
Modern business software at the core level is built from the algorithms performing computation/information processing. In word processors, web browsers, camera app on mobile phones - you will see a culmination of work of several algorithms working in background. These algorithms solved basic problems like storing, sorting, classifying information.
Another thing that set the computational problems of 70's to that of business software of 90's and early 2000's is - introduction of Natural language (Likes of English) for specifications. The problems that algorithms solved in 70's were represented in formal mathematical notation. With the introduction of Natural language at one end and high level programming languages like COBOL, Fortran, Pascal, C, C++, Java - we created this problem of translating what is specified natural language to computer language. This created a division between those understand business domain (Natural Language) and those understand computer language (Programmers). This is first big problem of software development. By natural consequence, validating that the program did as per what is specified in natural language - also got complicated. Software Testing that branched off from software programming as a distinct activity from early 90's - has been trying to bridge the gap between programmers and business folks.
The field of computer science deals with solving computing problems and algorithms. The hard problems in algorithm world are classified as P or NP problem. Interestingly this classification is based on evaluating if the algorithm produces result (halts as in halting problem) in a polynomial time function of size of the input or not. Those problems where algorithm fails to halt or produce results in a polynomial times are referred as NP problems - Non deterministic Polynomial problems.
Where does software testing stand in this classification of P and NP problems? If an algorithm were to test a computer program - would it halt and produce answer in polynomial time? How would an algorithm approach the problem of testing software ?
Here is an attempt to list down the problems that characterize software testing as NP problem.
Each problem listed here shows an aspect of testing that makes it hard to have have an efficient, less error prone and cost effective solution. These problems are hard as solutions that we see in practice are sub-optimal and need constant refinement.
1. Problem of potentially infinite sets of Inputs
Unlike programs/algorithms of 70's - modern business software receives and processes a large set of variables and equal or more numbers of input values directly sent to the program. Also modern software is not an isolated desktop software running on one computer - but a combination of several stand alone components running on different computers connected together in a network. A software under test by virtue of this arrangement continues to receive multiple implicit inputs that influence outputs the software produces. Then we have the database/sets of data elements that are managed by the software - state of this database also influences the outcomes of software. There are internal (to the software) configurations that allow software to be configured in many different ways.
The task of generating all or some "important" sets of direct inputs that are fed to the software while running and sets of all indirect inputs (database, network, internal product configs) - is one of the hard problem.
2. Problem of operating the software (and its dependencies) under test through set of inputs
The largest chunk of time of testing is spent in operating the software once we have configured software under test and its dependencies. A simple and single thread of this "operation" is the part of a larger unit called as "test case" or "test" that additionally involves making observations and inferences about outcomes of the "tests". Given infinitely large number of inputs (direct and indirect) there are equal number of ways of operating the SUT. This is hard problem. How can we run these "tests" in a finite time and resources? Who would run these tests? Human tester?
Then we will have questions about how these tests be specified, in what language and how detailed. We have attempted to use in both natural language (manual test case/script) and software language (Junit class). How to run these tests - we have tried "interfaces" of the SUT for this purpose. Most popular interface - GUI created an industry of test automation tools and the paradigm of "record" and playback". Some geeky programmers used interfaces like web service to execute the tests in an non interactive way. Both of these approaches have met success to a degree but have left lot to be desired.
The task of running tests - operating the software through a large set of inputs/flows is a hard problem that we need to solve, solve well.
3. The problem of Observing direct and indirect outcomes/behaviors
While programs of 70's produced one or more distinct outcomes as solution for a given problem - we in today's world need to world need to observe software behaviors. It is funny that we use term "behavior" to inanimate object like "software".
Like direct and indirect inputs that the software takes while in operation - an important puzzle of software testing is about observing "all possible" outcomes. How do we do that? Again - there is a human way and an automated way. Continuing on the testing task of running tests - you might argue that making observations on outcomes is extension of executing tests. This is true by and large. The challenge is to specify what all to observe and how. An automated test might say watch this space or this folder or look for this text message and so on. But that is only part of the test. Given a test, SUT shows many different behaviors and Capturing all of them is a hard problem. More than that - how do we know we have in our list all that we need to observe?
4. The problem of identifying correct and incorrect behaviors - problem of test oracles
On the contrary to what we believe, it is often not very clear as which software outcome is correct which one is a bug. To help in deciding, we use a reference or mechanism that can decide the correct behavior. Requirements specifications give first reference to what we should expect from software - in natural language. Given infinite sets of inputs and corresponding outcomes and behaviors - identifying the right and correct behavior requires a very large number of oracles.
More often than not, humans can and do act at live oracles - they use their own experience and some given references can identify correct behaviors. At times - data and captured behaviors or previous versions (assumed to be correct) of the application is used as test oracle.
5. Biggest of all - repeating all above many times, when software changes
Software is soft and when it is changed, many things change that are not expected to be changed. This is referred as regression. In the life of software, several times it needs to be changed, updated and new features and capabilities to be included - when such change happens, it is not enough to test and validate the changed areas/features - often we need to confirm that changes made did not break other working parts of the software. This means a continued effort and work testing software completely (almost) at all times when there is a change. To make matters worse, you need to do so called "regression testing" even when any external software (external to SUT) is changed. This is biggest problem we need to solve in testing - the burden continuous testing of entire application and its dependencies.
6. Problem of defining and quantifying value of Testing
Testing has no direct value for customer of end user who is interested in how and what features the product offers. Customer assumes that the delivered features work as expected. The value testing in the performance of the product in the hands of the customer is roped into the larger work by the team - mainly development team. The indirect nature of contribution of testing to overall product makes it hard for testing to assert itself and ask for due share in the success/failure of the product.
Our field is about half centuries old now. How would we approach these problems of testing software if we were to start all over today?
To be continued .... in part 2
- Problem of quantification how much testing needs to be done and how much is done
- Problem of estimation of testing required to be done given a scope
- Problem of Skill/ mindset
- Problem of expectations from Testing
No comments:
Post a Comment