Fix flaky CI tests: make flaky reruns draw fresh random values by basnijholt · Pull Request #494 · python-adaptive/adaptive

basnijholt · 2026-06-10T20:02:32Z

Problem

CI fails intermittently on randomized tests, including on PRs that touch no code (the pre-commit-ci autoupdate PRs) and on main itself. Failures found in recent run history:

Test	Failing runs
`test_balancing_learner[AverageLearner-gaussian]`	25698219293, 24054088102, 23168580248
`test_balancing_learner[Learner2D-ring_of_fire]`	24021387134 (on `main`), 24609450338
`test_learner_performance_is_invariant_under_scaling`	27297286567, 23768978596, 23768968649
`test_learner1d.py::test_tell_many`	24010073833

Root cause

Most of these tests already have @flaky.flaky(max_runs=N), but the retries never help: pytest-randomly reseeds the global RNG at the start of every test call phase — including reruns triggered by flaky — so every retry replays the exact same failure.

Reproduced locally with the seed from run 25698219293:

$ pytest "adaptive/tests/test_learners.py::test_balancing_learner[AverageLearner-gaussian-learner_kwargs5]" --randomly-seed=4082421528
...
AssertionError: [192, 2, 186, 180]   # identical values to CI, on all 10 flaky retries

Fix

Add fresh_seed_each_run, a decorator that mixes the rerun attempt number into the RNG seed at the start of the test call. This cannot be done in a fixture: pytest-randomly reseeds in pytest_runtest_call, after fixture setup. The first attempt keeps pytest-randomly's seed untouched, and the derived rerun seeds are a pure function of the session seed, so everything stays reproducible via --randomly-seed.

Apply it to all @flaky.flaky tests (test_balancing_learner, test_avg_std_and_npoints, test_tell_many).
Add @flaky.flaky(max_runs=5) to test_learner_performance_is_invariant_under_scaling, which had no retries at all (its float tie-breaking divergence between scaled and control learner is seed-dependent).

Verification

With the fix, the reproduced CI failure recovers on the first retry:

$ pytest "...test_balancing_learner[AverageLearner-gaussian-learner_kwargs5]" --randomly-seed=4082421528
test_balancing_learner[AverageLearner-gaussian-learner_kwargs5] failed (9 runs remaining out of 10).
test_balancing_learner[AverageLearner-gaussian-learner_kwargs5] passed 1 out of the required 1 times. Success!
1 passed

Full suite passes: 282 passed, 16 skipped, 64 xfailed, 25 xpassed (Python 3.13, macOS).
Scaling test passes on Python 3.14 with the seed from run 27297286567, with xfail/xpass params unaffected by the wrapper.
A 40-seed sweep of the balancing test showed no failed attempts, consistent with the low base failure rate (CI only hits it because every push runs a 12-job matrix with 2 pytest sessions each).

@flaky

pytest-randomly reseeds the global RNG at the start of every test call phase, including reruns triggered by the flaky plugin. A randomized test that fails for the session seed therefore fails identically on all retries, making @flaky.flaky useless and causing spurious CI failures. Add a fresh_seed_each_run decorator that mixes the rerun attempt number into the seed inside the test call (a fixture cannot do this, since pytest-randomly reseeds after fixture setup), apply it to all flaky tests, and add retries to the scaling-invariance test which had none.

basnijholt enabled auto-merge (squash) June 10, 2026 20:06

basnijholt disabled auto-merge June 10, 2026 20:06

basnijholt merged commit 6684a23 into main Jun 10, 2026
17 checks passed

basnijholt deleted the fix-flaky-test-reruns branch June 10, 2026 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky CI tests: make flaky reruns draw fresh random values#494

Fix flaky CI tests: make flaky reruns draw fresh random values#494
basnijholt merged 1 commit into
mainfrom
fix-flaky-test-reruns

basnijholt commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

basnijholt commented Jun 10, 2026

Problem

Root cause

Fix

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant