Fix flaky CI tests: make flaky reruns draw fresh random values#494
Merged
Conversation
pytest-randomly reseeds the global RNG at the start of every test call phase, including reruns triggered by the flaky plugin. A randomized test that fails for the session seed therefore fails identically on all retries, making @flaky.flaky useless and causing spurious CI failures. Add a fresh_seed_each_run decorator that mixes the rerun attempt number into the seed inside the test call (a fixture cannot do this, since pytest-randomly reseeds after fixture setup), apply it to all flaky tests, and add retries to the scaling-invariance test which had none.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CI fails intermittently on randomized tests, including on PRs that touch no code (the
pre-commit-ciautoupdate PRs) and onmainitself. Failures found in recent run history:test_balancing_learner[AverageLearner-gaussian]test_balancing_learner[Learner2D-ring_of_fire]main), 24609450338test_learner_performance_is_invariant_under_scalingtest_learner1d.py::test_tell_manyRoot cause
Most of these tests already have
@flaky.flaky(max_runs=N), but the retries never help:pytest-randomlyreseeds the global RNG at the start of every test call phase — including reruns triggered byflaky— so every retry replays the exact same failure.Reproduced locally with the seed from run 25698219293:
Fix
Add
fresh_seed_each_run, a decorator that mixes the rerun attempt number into the RNG seed at the start of the test call. This cannot be done in a fixture:pytest-randomlyreseeds inpytest_runtest_call, after fixture setup. The first attempt keepspytest-randomly's seed untouched, and the derived rerun seeds are a pure function of the session seed, so everything stays reproducible via--randomly-seed.@flaky.flakytests (test_balancing_learner,test_avg_std_and_npoints,test_tell_many).@flaky.flaky(max_runs=5)totest_learner_performance_is_invariant_under_scaling, which had no retries at all (its float tie-breaking divergence between scaled and control learner is seed-dependent).Verification
With the fix, the reproduced CI failure recovers on the first retry:
282 passed, 16 skipped, 64 xfailed, 25 xpassed(Python 3.13, macOS).xfail/xpassparams unaffected by the wrapper.