How to use computer vision for your test automation
If you need to test cases that are impossible with traditional automation frameworks, you need to expand your toolkit. Examples for when this might be necessary include gaming console automation, smoke testing when an application is under active development and the UI is constantly being changed during a sprint, ad banners, and when the app uses Android/iPhone keyboards.
For example, one of our customers needed to have an Xbox app automated. For various reasons, we couldn't use the existing tech stack and we also couldn't build a new one, since we didn't have access to the elements tree, which displays the data-access structure as a folder structure.
After doing extensive research, we concluded that the only way to solve this problem was to use computer vision technology, which detects the controls and elements on pages.
Here's how my team implemented this approach.
Defining the problem
For automation testing my organization used Carina, an open-source test automation framework that handles Selenium actions, makes them stable, and provides reports for the automation team. Also, this framework integrates well with test management systems, bug-tracking systems, and other tools. But its key feature is handling Selenium actions.
Our team needed to develop an alternative to building an elements tree because we didn't have access to the client's. We wondered: What if we used an approach that comes from manual QA? That is, what if we looked through a page to detect and recognize its elements, as manual testers do.
So the main goal became to implement technologies that could help us implement this approach. We decided to use neural networks and computer vision. Neural networks can detect and classify objects from the images computer vision produces.
Computer vision and neural networks
Our team didn't include any neural network specialists, so the first step was to find an existing solution for building one. After some research, we found Darkflow, an open-source framework for real-time object detection and classification.
This tool uses Tensorflow—one of the most popular open-source solutions of its kind, with detailed documentation and a vast support/contribution community—as its machine-learning framework. Tensorflow also lets you export graphs that can be used anywhere.
For its part, Darkflow has pretty clear documentation on GitHub, making it easy to train a network to use it.
We trained the network by the means of page screenshots and detected the elements using their coordinates. To detect an element, we needed the coordinates of its two points (top left and bottom right). Then we could easily calculate the position of any control center, including any divergences.
The following screenshot, from our testing process, will serve as an example.
- Create a directory with images for training, and copy all screens there ($MODEL_NAME/img).
- Create a directory with XMLS for training, and copy all screens there ($MODEL_NAME/ann). (Do you mean XML or the XMLS parser?)
- In the $DARKFLOW_HOME directory, create a file label such as $MODEL_NAME.txt, and put all the controls that you have marked before in there (text_field, link, header, etc.).
- You may need to create one file manually; it's needed to save neural-network graphs: $DARKFLOW_HOME/cfg/$MODEL_NAME.cfg
- To start training, just run the command:
nohup $DARKFLOW_HOME/flow --train --labels $DARKFLOW_HOME/labels-$MODEL_NAME.txt --annotation $MODEL_NAME/ann --dataset $MODEL_NAME/img --model $DARKFLOW_HOME/cfg/$MODEL.cfg --load $DARKFLOW_HOME/bin/tiny-yolo-voc.weights \--trainer adam --gpu 0.9 --lr 1e-5 --keep 10 --backup $DARKFLOW_HOME/ckpt/$MODEL_NAME/ --batch 16 --save 500 --epoch 2000 --verbalise > ../logs/create_model_$MODEL_NAME.log &
Try it out yourself
Another plus is that you can also use computer vision and neural networks for regular tasks. And you can integrate this approach with real automation cases from production. But that's a discussion for another time.