Skip to content

Fix flaky CI tests: make flaky reruns draw fresh random values#494

Merged
basnijholt merged 1 commit into
mainfrom
fix-flaky-test-reruns
Jun 10, 2026
Merged

Fix flaky CI tests: make flaky reruns draw fresh random values#494
basnijholt merged 1 commit into
mainfrom
fix-flaky-test-reruns

Conversation

@basnijholt

Copy link
Copy Markdown
Member

Problem

CI fails intermittently on randomized tests, including on PRs that touch no code (the pre-commit-ci autoupdate PRs) and on main itself. Failures found in recent run history:

Test Failing runs
test_balancing_learner[AverageLearner-gaussian] 25698219293, 24054088102, 23168580248
test_balancing_learner[Learner2D-ring_of_fire] 24021387134 (on main), 24609450338
test_learner_performance_is_invariant_under_scaling 27297286567, 23768978596, 23768968649
test_learner1d.py::test_tell_many 24010073833

Root cause

Most of these tests already have @flaky.flaky(max_runs=N), but the retries never help: pytest-randomly reseeds the global RNG at the start of every test call phase — including reruns triggered by flaky — so every retry replays the exact same failure.

Reproduced locally with the seed from run 25698219293:

$ pytest "adaptive/tests/test_learners.py::test_balancing_learner[AverageLearner-gaussian-learner_kwargs5]" --randomly-seed=4082421528
...
AssertionError: [192, 2, 186, 180]   # identical values to CI, on all 10 flaky retries

Fix

Add fresh_seed_each_run, a decorator that mixes the rerun attempt number into the RNG seed at the start of the test call. This cannot be done in a fixture: pytest-randomly reseeds in pytest_runtest_call, after fixture setup. The first attempt keeps pytest-randomly's seed untouched, and the derived rerun seeds are a pure function of the session seed, so everything stays reproducible via --randomly-seed.

  • Apply it to all @flaky.flaky tests (test_balancing_learner, test_avg_std_and_npoints, test_tell_many).
  • Add @flaky.flaky(max_runs=5) to test_learner_performance_is_invariant_under_scaling, which had no retries at all (its float tie-breaking divergence between scaled and control learner is seed-dependent).

Verification

With the fix, the reproduced CI failure recovers on the first retry:

$ pytest "...test_balancing_learner[AverageLearner-gaussian-learner_kwargs5]" --randomly-seed=4082421528
test_balancing_learner[AverageLearner-gaussian-learner_kwargs5] failed (9 runs remaining out of 10).
test_balancing_learner[AverageLearner-gaussian-learner_kwargs5] passed 1 out of the required 1 times. Success!
1 passed
  • Full suite passes: 282 passed, 16 skipped, 64 xfailed, 25 xpassed (Python 3.13, macOS).
  • Scaling test passes on Python 3.14 with the seed from run 27297286567, with xfail/xpass params unaffected by the wrapper.
  • A 40-seed sweep of the balancing test showed no failed attempts, consistent with the low base failure rate (CI only hits it because every push runs a 12-job matrix with 2 pytest sessions each).

pytest-randomly reseeds the global RNG at the start of every test call
phase, including reruns triggered by the flaky plugin. A randomized test
that fails for the session seed therefore fails identically on all
retries, making @flaky.flaky useless and causing spurious CI failures.

Add a fresh_seed_each_run decorator that mixes the rerun attempt number
into the seed inside the test call (a fixture cannot do this, since
pytest-randomly reseeds after fixture setup), apply it to all flaky
tests, and add retries to the scaling-invariance test which had none.
@basnijholt basnijholt enabled auto-merge (squash) June 10, 2026 20:06
@basnijholt basnijholt disabled auto-merge June 10, 2026 20:06
@basnijholt basnijholt merged commit 6684a23 into main Jun 10, 2026
17 checks passed
@basnijholt basnijholt deleted the fix-flaky-test-reruns branch June 10, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant