Flaky Test Detection

Flakiness and Flaky Score

QMetry Test Management for Jira has introduced an AI-powered, innovative approach to assessing the flakiness of test cases by calculating a "Flaky Score" derived from their execution history; this insightful feature empowers testers with the ability to identify test cases whose future execution status is non-deterministic.

A flaky test case refers to one that exhibits non-deterministic behavior when executed repeatedly in the same code and environment, resulting in intermittent successes and failures. The crucial first step towards gaining control over flaky tests is identifying them.

With QMetry Intelligence, the process of determining test case flakiness is now automated, saving the time and effort for testers and developers.

QA Managers can define and configure settings according to their specific testing processes for calculating the flaky score, ensuring its relevance to their testing methodologies.

For Example, the following table shows the execution results of the test cases executed multiple times.

Test Case Name

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

Test 8

Flaky or Non-flaky?

Test Case A

Pass

Pass

Pass

Pass

Pass

Pass

Pass

Pass

Non-flaky

Test Case B

Fail

Fail

Fail

Fail

Fail

Fail

Fail

Fail

Non-flaky

Test Case C

Pass

Pass

Fail

Fail

Pass

Fail

Pass

Fail

Flaky

Use Cases:

  • QA Managers/Testers get to view the risk probability associated with the test cases under test.

  • The flaky score on the execution screen shows the tester the risk probability while executing the test case.

Note

The Flaky Score is calculated only for the executed test cases and is determined based on the latest X number of executions. It ranges between 0 (Not Flaky) and 1 (Flaky).

Required Permissions:

Users should have the following QMetry Intelligence permissions to configure and generate Flaky Score:

  • Modify Flaky and Success Rate Configuration

  • Calculate Flaky and Success Rate Score

Note

  • Only the final execution status assigned to the test case will be considered.

  • The Flaky Score is calculated only for the test cycles of the same project.

Flaky Score Settings

Perform the following steps:

  1. Go to Apps, select QMetry, and select QMetry Intelligence.

  2. Select the Flaky Score Settings option and select the Configuration tab.

  3. Provide the following details for flaky score rate configuration:

    Display Flaky Score Settings:

    • The feature provides users with the option to toggle the display of the flaky score. If the feature is enabled, it allows configuration for laky Score computation. QA managers should configure and refine the criteria to calculate Flaky Score based on their QA reporting methodologies.

      Users can then show/hide the Flaky Score column on the test case list view, test cycle > test cases tab, test execution screen, and Test Case/Acceptance Criteria section on the Jira story page. The Flaky Score is also displayed on the test case detail page.

      Flaky Score Settings

    Flaky Score Configuration:

    • Test Case Executions: Mention the number of the latest test executions for a test case you want to consider while calculating the flaky score. 10 test executions are selected by default. The Test Case Execution value should be from 1 to 1000.

    • Test Case Versions: Select the executions for either the “Latest” or “All” test case versions to consider while calculating the flaky score. The “Latest” test case version is selected by default.

      • Latest: The test executions of only the latest test case versions will be considered while calculating the flaky score.

      • All: The latest test executions of all test case versions will be considered while calculating the flaky score.

    • Environment: Select environment(s) to consider the test execution run against that particular environment(s). “All” environments are selected by default.

    • Build: Select build(s) to consider the test executions pertaining to that particular build(s). “All” builds are selected by default.

    • Fix Version: Select Fix Version(s) to consider the test executions for the test cycles with the selected Fix Version(s). “All” Fix Versions are selected by default.

    • Sprint: Select sprint(s) to consider the test executions for the test cycles covered in that particular sprint(s). “All” sprints are selected by default.

    • Execution Timeframe (days): Mention the days to define the timeframe in which the test executions were executed in the last specified days. By default, 90 days are selected, which indicates the test executions carried out in the last 90 days will be considered while calculating the flaky score. The Execution Timeframe value should be from 1 to 365 days.

Set Test Case Status

  • As Passed: Select the status(es) to define the test case status as “Passed”. You can select multiple statuses from the list.

  • As Failed: Select the status(es) to define the test case status as “Failed”. You can select multiple statuses from the list.

Click Save to save the Flaky Score Settings.

To reset the settings, click the Reset to Default button.

To generate the flaky score, click the Generate button.

Flaky Score Config Tab

You can view the progress of the background process in the Notifications section.

Note

Note: The Flaky Score can be calculated once in 24 hours for each project. Once you generate the Flaky Score, the Generate button will remain disabled for the next 24 hours.

When you hover over the Generate button, it shows the details of the user who generated Flaky Score last with the timestamp.

Once you generate the Flaky Score on the Flaky Score Settings screen, you can view the Flaky Score on the test case list view, test cycle > test cases tab, test execution screen and Test Case/Acceptance Criteria section on the story page. The Flaky Score is also displayed on the test case detail page.

Display of Flaky Score

The lesser the value (closer to 0), the test case has less flakiness. It means the behavior is deterministic.

The larger the value (closer to 1), the test case has more flakiness. It means the behavior is non-deterministic.

The following table interprets the intensity of the flaky score in accordance with its derived calculation and color code.

Flaky Score

Intensity of Flakiness

Flaky Score High

High

Flaky Score Medium

Medium

Flaky Score Low

Low

View Flaky Score on Test Case List View

As per the Flaky Score Settings done in the Configuration, the Flaky Score is calculated and displayed on the test case list view. It allows QA managers/Testers to view the risk probability and complete the execution history of the test case along with affected user stories.

Go to the Test Case module and make the Flaky Score column visible for the list view.

  • Flaky Score: It is calculated based on the frequency of pass or fail results.

You can see the Flaky Score column with corresponding statistics. You can sort on the column to view test cases with higher flaky scores.

Flaky Score TC List View

Click the Flaky Score to view the traceability report for the test case, which shows the test case is associated with which requirements, test cycles, defects, and execution results. The report helps you analyze the Flaky Score further.

If the Flaky Score Settings are changed, the existing Flaky Score will get reset.

The info icon beside the Flaky Score column displays the details of when the score was last generated and by whom.

View Flaky Score on Test Case Detail Page

The generated Flaky Score is displayed beside the Test Case Key at the top of the screen.

Flaky Score Test Cycle Detail Page

View Flaky Score on Test Cycle Detail Page

Perform the following steps to view the Flaky Score on the Test Cases tab:

  1. Go to the Test Cycle module.

  2. Navigate to the test cycle details page and click the Test Cases tab.

  3. Click the Arrange Column menu and make the Flaky Score column visible on the screen.

    The Test Cases tab displays the Flaky Score statistics.

    Flaky Score Test Cycle Detail Page

View Flaky Score on Test Execution Screen

As per the Flaky Score Settings done in the Configuration, the Flaky Score is calculated and displayed accordingly. The flaky score shows the tester the probability of risk while executing the test case and provides a way for comparing pre and post-test execution results.

Go to the Test Execution screen.

Grid View

You can view the Flaky Score statistics just adjacent to the test case key at the top.

Flaky Score on Grid View

List View

Make the Flaky Score column visible on the screen.

Flaky Score on List View

Click the Flaky Score to drill down to view the test executions and defects associated with the test case.

View Flaky Score in Story

The flaky score of the linked test cases is displayed in the Story. The flaky score shows the tester the probability of risk while executing the test case. It allows testers to compare the pre and post-test execution results of a test case.

Go to the Story > Test Case/Acceptance Criteria and make sure you make the Flaky Score column visible.

You can see the Flaky Score column with relevant statistics.

Flaky Score in Story

How to Reduce Flakiness?

Flaky tests can occur due to various reasons. Testers should collaborate with their development teams to identify the root cause. Here are the ten common causes of flaky tests:

1. Async wait: Some tests are written in a way that requires waiting for something else to complete. Many flaky tests use sleep statements for this purpose. However, sleep statements are imprecise, and the test may fail if the waiting time exceeds expectations due to variations in processing time.

2. Concurrency issues: Flaky tests can result from concurrency issues such as data races, atomicity violations, or deadlocks. These tests often make incorrect assumptions about the ordering of operations performed by different threads. To address this, synchronization blocks can be added or the test can be modified to accommodate a wider range of behaviors.

3. Test order dependency: Certain tests may pass or fail depending on the order in which preceding tests were executed. A good test should be independent and able to run in any order with other tests. It should be properly isolated and set up its own expected state.

4. Timing issues: Flaky tests can arise from timing inconsistencies when the test code relies on specific event timings. For example, if a test checks for a particular webpage element after a specific delay, network issues or differences in CPU performance between test runs can lead to intermittent failures.

5. Element locator issues: Many automation tools use XPath to locate webpage elements, but XPath can be unstable as it is sensitive to changes in the page's DOM. Self-healing AI techniques can address challenging testing scenarios involving dynamic elements, iFrames, and pop-ups. This involves using multiple locator strategies to find an element, and switching to a backup strategy if the primary one fails. Modifications to an element's properties or the addition of similar elements can render the initial XPath invalid, resulting in false positives or negatives.

6. Test code issues: Poorly written or ambiguous test code can contribute to flaky tests. If the test code lacks clarity regarding the expected application behavior, the test may inconsistently fail or pass. Additionally, complex test code or code relying on external dependencies may be more prone to failure.

7. Test data issues: Tests that depend on inconsistent test data can become flaky. Corrupted test data or different test runs using the same data can lead to inconsistent results. Tests utilizing random data generators without considering the full range of possible results can also introduce flakiness. It is advisable to control the seed for random data generation and carefully consider all possible values.

8. Test configuration issues: Inconsistent test configurations between runs can cause flaky tests. Incorrect test parameters or improper test settings setup can result in test failures.

9. Environment issues: Flaky tests can be attributed to the execution environment. Network connectivity problems, improper handling of I/O operations, hardware differences between test runs, or variations in test environments can introduce non-determinism, leading to flaky tests.

10. Resource leaks: Tests can become flaky if the application or tests do not adequately acquire and release resources, such as memory or database connections. To avoid such issues, it is recommended to use resource pools and ensure that resources are properly returned to the pool when no longer needed.

Publication date: