I just ran this case, and it passed. And now it’s failing—WTF?!

Let’s assume you have a robust regression set, and some cases fail and then pass when rerun, but nothing has changed in the environment or the case. Unfortunately, you might have flaky test cases in the mix.

A flaky test sometimes works and sometimes doesn’t, even though the code hasn’t changed. Flakiness can be a big problem for Test Managers and cause immense pain for Test Automation Engineers.

There is no best practice for such an unknown. It may seem like you are searching for a needle in a haystack. When in these situations, I like to apply some heuristics that help me better understand where I am and what is going on, so that I can better understand what I can do next. In this situation, I use OODA Loops.

 

OODA is a heuristic for Observe, Orient, Decide, & Act. Using OODA loops, I am provided with a way to learn more about my SUT (System Under Test), a way to deal with uncertainty, and a way to get things done.

Flaky tests frequently appear random, but there are many things we can do other than re-run until green approach.

Here is an example of applying OODA to a test set.

  • Observe: What’s the problem?
  • Orient: What requirements do I have? What is the case testing for?
  • Decide: the fix – What issue does this solve – What issue could it create? Implement fix.
  • Act:  Add case to test set and execute.

If the error persists iterate and go back to step 1.

Now, let’s look at some general causes and possible solutions for Flaky tests.

Waiting / Time-out Issues

A timing situation can occur when the time required to complete an action is barely above the programmed timeout value. For instance, a scenario can arise, the timeout value is set to

timeout=”30,” but sometimes it takes longer. If it takes longer, then the program will end in an error.

A possible solution might be using control statements: AST introduced the control statement <signal />. The signal statement allows you to set and get globally accessible key/value pairs. Once this has been done the system will wait until a signal until a certain key has been set, thus further improving the creation of dependent test scenarios.

Example:

The test case waits for the signal with the key “TC1_account_nr” to be set and then reads the value of that key.

<wait signal=”TC1_account_nr” timeout=”30″ raise=”true”/>

<variable name=”l_val” />

<signal key=”TC1_account_nr” select=”l_val” />

Note: These conditions need to be closely monitored, as they could also be an indicator that some tuning for the testing environment might be necessary.

Setup/Breakdown

I often see that test cases are not set up or cleaned up properly, and that is affecting the next test. When I’m trying to figure out how to fix these problems I like to do the following:

  1. Find out what the previous test is doing that could make the later test fail. Most often, it is data. Sometimes, I need to enable logging capabilities. This gives me a better understanding of what is going on inside the system. Note: AST has some pretty cool logging capabilities. I have written on these before but if you need a refresh let me know.
  2. Understand if there is a better way to order the tests. Can I break it down into smaller batches? Divide and Reduce.
  3. Pay attention to variables and how they are being used in the test case.

 Network

Dependence on a network could lead to flaky behaviour for a simple reason: sometimes the network is up, and other times it isn’t, or a part of it is not. In this case, you should try to ensure that the network is working by simply having some preconditions in your test case.

Table

Finally, many cases try to pick the first thing from a list. Various workflows may need to run before the data is written into the table, and in a test environment, this might take more time. Because of this, ensure you select a specific item instead of snatching the Nth item in a list.

If you believe you have a serious issue with flaky cases or would just like us to show you how to better structure your testing, then send us an email at info@ast-suite.com. Perhaps with a little training, we can help you solve the flaky case phenomenon and improve your testing strategy.

Summary

Flaky cases can happen in an Avaloq environment. You can use OODA Loops to help you learn, understand flaky cases better, and build an iterative strategy for success.

  1. Begin small.
  2. Run regression set with the correct amount of logging – frequently
  3. Identify flaky cases Observe
  4. Sort out the flaky tests Orient
  5. Fix Decide
  6. Add fixed tests back into the regression set slowly ACT

 

If you believe you have this issue, then send us an email at info@ast-suite.com. Perhaps with some training and some information from your side, we can help you solve the flaky case phenomenon.