Monday, June 24, 2019

There is no such thing called defect/bug in Machine Learning/AI domain

One question that comes up again and again in Testing world today is about role of testing in the domain of applications in Machine learning and Artificial Intelligence. To be precise, many in testing community are curious and some-what confused about what they need to do differently (if at all) and what skills they need to acquire additionally. This is post is an initial attempt to share my thoughts in this direction.

What is an ML Application ?
(Machine Learning is considered to be a branch of Artificial Intelligence, hence Omitting using AI along with ML)
The term "Machine Learning" is not new, it was coined by Arthur Samuel in 1950. Definition given by Arthur was "ability of computers to learn without being explicitly being programmed".  In reality, computers do not learn, but software programs learn - a small difference, if you chose to care. How do programs gain such ability to demonstrate such human-like ability to learn? Any any or every program be made to "learn" like this? What has enabled today's computer's technology enabled such possibility being realized? Answers to these questions take the post beyond the topic about ML, Testing and defects/bugs. In short - I would say ability of computers to store and process large volumes of data at the speed needed at processing transactions - has enabled Machine learning as Arthur Samuel might have envisaged.

What is Machine Learning application then? A program that  uses a set of algorithms processing sets of specially selected and curated data about a problem that program intends to solve. Under the hood, the algorithms "fit" the data to some selected mathematical "function" called as "model" such that the programs logic is data driven not hard coded. When I say hard coded in ML parlance - you will not find explicit chunks of if-else or select-case or do-while depicting rules of logic. The "model" through "fitting", generates the logic that data presented to it shall comply.

What kind of problems ML programs can solve? Largely two categories of problems - prediction and suggestion. A machine learning program can classify a bunch of financial transactions (say credit card) as fraudulent (potentially) or genuine or recognize faces in a picture or auto complete what you are typing in a search box on a web page. 

What does it mean for a program to learn ?
In simple language - learning for a program is to discover parameters of mathematical function that program uses to establish relation between input and output. Let us take an example of classification that aims to predict whether an image contains text or not. In this case the image and its properties (what each pixel tells about the whole picture) are inputs and output is a binary decision whether image contains text or not (1 or 0). For a human eye - it is easy to make the decision where as for computer  - the problem needs to be presented as (an example) a mathematical function like y =f (x). This function will have its parameters that the program needs to compute. For this purpose the program needs to presented with loads of data (input images and decision whether there is text is there or not). By processing this data the program is expected to identify the relation between "y" and "x" which is a mathematical function like y=mx+c (here m and c are parameters of the function).
This process of arriving at parameters of the function by working through data is called as "learning". Once the program learns the relationship - then, it can predict "y" - decision that whether image contains text of not - given any new image that program has not "seen" before.

Needless to say computer (program) does not "see" the image like a human eye - it (program) sees the image as a matrix of numbers that indicate pixel color scale or density. There easy python modules/programs that can convert an image into a matrix of numbers that a learning program can consume.

Also important to note all that data that the program has "seen" or processed during the process of "learning" does not stay with the program. What is left in the program is just the "essence" of data that leads to establishing the relationship y=f(x) in the form of parameters of the function. The data that program uses to "learn" the relationship is called as "Training Data" - how innovative !!!

Coming back to main topic of the post - what does a bug mean in this context ? When a program incorrectly calls an image as containing text when image does not contain text  - do we call that behavior as application bug? ML programmer would probably call  it as "program is learning" or "program needs to see more data to increase its accuracy of prediction". In this way - every opportunity for program is learning, like we say a lawyer or doctor as "practicing" - ML program, probably never "performs" but always in the process of "learning" !!!

What do you say? If program does learning (I have dislike for the term "machine learning" as its not machine that learning - its the program that is learning. Try saying programming learning, or software learning !!! its funny) - what testers need to learn ? What is left for testers to learn if programs become intelligent ?

Wednesday, May 15, 2019

Industrialisation of Testing, Heuristics and Mindfulness

Over last two week end - The Test Tribe (popular testing community) hosted two sessions on facebook - one from T Ashok on Smart QA and other from James Bach on "Testing Heuristics". Both sessions were well received and interestingly I could see some connection between ideas that were part of these two sessions.

Industrialisation of Testing - Up until now - I thought industrialization in testing as bring "factory" metaphor into what we do as testers - intellectual search for problems in products we test. Ashok T in his session took a different position. He says industrialization in testing is about doing less through exploiting work done by fellow testers in the form of tools, test ideas, methods etc. He drew parallel with how software development community though its open source revolution - makes it possible to build application with writing less and less code. He stressed on creating open source revolution in testing so that testers can share their ideas so that we can use, reuse and grow testing repository. That would be true industrialization. There has been such work happening in our community - what we need a platform and such active participation/contribution.

Mindfulness Ashok in his session urged testers on mindfulness - acting with awareness of how we work, why we do what we do. Very nature of the mind is such that it wants wander and then programs in subconscious mind take over - run the what we do without our conscious engagement. Testers through their habits go about their day's business without being consciously aware of decisions, choices they make. Through mindfulness, testers would need to break the autopilot mode and carefully watch every step - this will enhance their skill, productivity and reduce errors they make in their work. Rarely I have seen such an advise to testers  - indeed a point to note.

Heuristics  James Bach in his session on Heuristics - went on in detail to explain how all testing, software development and Engineering is rooted in heuristics - fallible methods to solve problems. Those who follow context driven testing community are well aware of this term. James explained how heuristics need human judgement not mere following the rule -as heuristics can fail. James said in our daily life we use many heuristics without being aware. He urges, from his own training and experience, to be aware and name a heuristic when you use one.

Here is where I am reminded of mindfulness that Ashok suggested to use. By being mindful -we can recognize heuristics we use, when we recognize , we can name them, when we name them - we can share with fellow testers. That leads a community movement which manifests as Testing industrialization. Its exciting to see these two testing guru's ideas are connected in unimaginable ways.