How to Detect and Prevent Flaky Tests?

abstract sphere

Incredibuild logo

Incredibuild Team

reading time: 

7 minutes

Flaky is great when you are talking about pastries, not your tests.

If you’ve ever had a test that passes one moment and fails the next, even though nothing changed, you’ve likely run into a flaky test. These unpredictable tests can derail CI/CD pipelines and slow down releases. 

But flaky tests don’t have to be a permanent problem. In fact, you can get rid of the flaky stuff permanently. Here is a closer look at the concept and weapons against it. 

What Are Flaky Tests?

Flaky tests are automated tests that sometimes fail and sometimes pass, even when the code in question remains exactly the same. These inconsistent results make it hard to know whether something is actually broken.

Such test failure is especially dangerous in automated pipelines, where false negatives can block deployments or cause teams to ignore legitimate issues. 

Why Are Tests Flaky?

Flaky tests don’t just happen randomly. Most of the time, there’s a pattern or a certain cause behind the behavior.

Here are some of the most common reasons for their occurrence:

  • Timing Issues: Tests that rely on specific delays or wait for something to load can fail if the timing isn’t just right.
  • Concurrency Problems: When multiple tests run at the same time, they might interfere with each other. This is especially common if they share files or memory.
  • Environment Differences: A test might behave one way locally and another way in a CI environment due to differences in OS or hardware.
  • Unreliable Third-Party Dependencies: If your test calls an external API or service, any instability in that service can cause it to fail.
  • Poorly Written Tests: Tests that depend on specific implementation details or fragile selectors are more likely to become flaky.
  • Order Dependency: Some tests pass only when you run certain tests before them. When the order changes, they fail.

Understanding the cause is the first step toward solving the problem. Once you identify patterns, you can start tackling the specific issues behind each flaky test.

The Importance of Flaky Test Detection

Why should you care about flaky tests? Because the longer they stick around, the more damage they do.

Flaky tests can erode trust in your test suite. Developers may start to ignore failures, assuming the test is just acting up again. This opens the door for real issues.

They also slow down your development process. Time spent rerunning tests and debugging false positives is time taken away from building your product.

Overall, flaky tests can:

  • Delay releases
  • Increase engineering costs
  • Reduce team confidence
  • Hide actual bugs

In short, identifying flaky tests early is integral to CI/CD workflow optimization. 

How Do You Detect Flaky Tests?

Detecting flaky tests isn’t always easy, but there are a few proven methods that can help.

1. Repeated Test Execution

One of the simplest ways to spot a flaky test is to run it multiple times. If the test passes sometimes and fails other times without any code changes, you’ve found a flake.

2. Analyze Historical Test Data

Use your CI/CD tool to check past test runs. Look for patterns: Do certain tests fail more often on specific branches or at certain times of day? Tools like Jenkins, GitHub Actions, or CircleCI can help you track and compare results.

3. Identify Parallel Execution Failures

Some tests only fail when run in parallel. Try running the same test suite sequentially and then in parallel to see if there is a difference.

4. Monitor Execution Time

Tests that show large variations in execution time may be flaky. Sudden spikes in duration often hint at underlying timing or dependency issues.

5. Use Detection Tools

Several tools exist to help with flaky test detection:

  • Test Retry Plugins (e.g., Jest Retry Times, pytest-rerunfailures)
  • CI Dashboards with flake tracking features
  • Custom Scripts that flag inconsistent test outcomes

Once you’ve flagged potential flaky tests, isolate them for further inspection and begin working toward a fix.

How Do You Fix Flaky Tests?

Fixing a flaky test usually means digging into the test itself and exploring the system around it.

Start by reproducing the flake. This may require running the test many times or modifying the environment to mimic the conditions where it fails.

Then, depending on the cause, consider these solutions:

  • Add proper waits or synchronization: Instead of using hard-coded delays, wait for specific conditions like element visibility or event completion.
  • Mock or stub external dependencies: This removes variables like API response time or internet connectivity.
  • Isolate shared resources: If your test is using a shared file, database, or variable, isolate it so each test gets its own copy.
  • Refactor complex test logic: Simplify and clarify what the test is doing. Often, flaky behavior hides in overly complicated test code.
  • Make tests idempotent: Ensure running the test multiple times doesn’t change the outcome or the environment.

Fixing flaky tests is similar to addressing build failures caused by inconsistent environments. You need to eliminate hidden variables and ensure stability across test runs.

It may take some trial and error, but investing the time to fix flaky tests now has a high return. 

How to Prevent Flaky Tests

The best strategy for flaky tests? Don’t let them happen in the first place.

Here are some proactive steps your team can take:

  • Follow Test Automation Best Practices: Write focused tests that each check one thing. Avoid relying on UI if it’s not necessary.
  • Use Mocks for Unstable Services: Don’t make real API calls unless you are testing integrations specifically.
  • Keep Tests Independent: No test should rely on the result of another test.
  • Run Tests in a Clean Environment: Use containers or isolated environments to reduce system noise.
  • Analyze Test Failures Regularly: Build alerts or dashboards into your CI system to catch issues early.

Fewer flaky tests mean fewer delays and a faster feedback loop. Less confusion for your dev team means better results. 

Flaky Tests: Great for Croissants, Bad for Pipelines

Flaky tests aren’t just annoying. They’re costly. They slow down releases and break trust in your test suite. But the good news is they can be detected and eliminated.

Don’t let flaky tests chip away at your confidence. Take action today to improve test stability and optimize your CI/CD workflow while improving developer productivity.

FAQs about Flaky Tests

What is the difference between brittle and flaky tests?

Brittle tests break when the underlying code changes, even if the change shouldn’t affect the outcome. They are sensitive to implementation details, like the exact structure of a webpage or the naming of a variable.

Flaky tests, on the other hand, fail randomly even when the code hasn’t changed. They might pass one minute and fail the next (without any clear reason).

While brittle tests are fragile by nature, flaky tests are inconsistent. Each problem is worth your attention, but requires a different solution. 

What is an example of a flaky test?

Here’s a classic example in Python:

python

CopyEdit

def test_login():

    driver.get(“https://example.com/login”)

    driver.find_element_by_id(“username”).send_keys(“user”)

    driver.find_element_by_id(“password”).send_keys(“pass”)

    driver.find_element_by_id(“login-button”).click()

    assert “Welcome” in driver.page_source

This test might fail if the page loads slowly. Adding proper waits can make this test more stable.

How to measure flaky tests?

Measuring flaky tests helps track improvement and prioritize what to fix.

  • Flake Rate: Count how often a test fails, divided by how many times it runs. A 20% flake rate means it fails 1 out of every 5 runs.
  • Stability Score: Some teams use custom scoring systems based on test duration, historical pass rate, and number of reruns.
  • CI Analytics Tools: Many platforms like Jenkins or GitLab can give you flaky test reports.

Tracking these metrics helps you catch issn the other hand, a low hit rate may suggest possible inefficiencies, such as redundant task execution. Improving cache hits can be challenging, but optimizing these significantly accelerates build speeds and reduces resource consumption.