Tags: pipefunc/pipefunc
Tags
ENH: Store polars DataFrames as Parquet and support `pl.LazyFrame` in… …puts (#966) * ENH: Store polars DataFrames as Parquet and support pl.LazyFrame inputs Implements the remaining part of #879: DataFrame outputs are serialized as Parquet files on disk (with cloudpickle fallback), and parameters annotated as pl.LazyFrame receive lazy frames - a true pl.scan_parquet when the upstream output is stored as Parquet, otherwise .lazy(). * PERF: Cache LazyFrame-annotated parameter names on PipeFunc Avoids iterating all parameter annotations per map element in _convert_lazyframe_kwargs; the no-polars-params fast path is now a single cached attribute lookup (~46 ns/call). * PERF/MAINT: Single-open format sniffing in load() and move LazyFrame conversion onto PipeFunc - load() previously opened every file twice (magic-byte sniff + read); now peeks and rewinds a single handle. This path is hot for all users. - _convert_lazyframe_kwargs is now a PipeFunc method, living next to the _lazyframe_parameters cache it uses, so _pipeline/_base.py no longer imports a private helper from map/_run.py. - Use bytes.startswith for the magic check in _load_all. * TST: Cover polars-not-imported branches The all-deps CI session imports polars at collection time, so the is_imported('polars') early returns never executed there (codecov/patch flagged them). Simulate via monkeypatch.delitem(sys.modules, 'polars').
BUG: Mutating returned results leaks into cached copy reused by Pipel… …ine.run (#905) * BUG: Mutating returned results leaks into cached copy reused by Pipeline.run * BUG: Store and return deep copies in in-memory caches Fixes mutations of returned (or later-mutated) results leaking into cache entries, making repeated `pipeline.run` calls non-idempotent. - `SimpleCache`, `LRUCache(shared=False)`, and `HybridCache(shared=False)` now deep-copy values on `put` and `get` (shared caches already isolate values via (de)serialization). `DiskCache`'s in-memory LRU layer inherits the fix. - Opt out with `copy=False` for large never-mutated values. - Falls back to the original object with a warning if a value cannot be deep-copied.
MAINT: suppress google-crc32c pure-python fallback warning (#942) * feat: add persisted run status CLI * style: simplify run status imports * feat: persist live run status heartbeat * fix: preserve bytes in heartbeat status * test: cover run status real scenarios * test: simplify and speed up run status coverage * fix: harden persisted run status inspection * refactor: type storage folder on base class * docs: add persisted run inspection guide * docs: clarify run status heartbeat requirements * fix: clean up run status review findings - Remove unused `slow_scalar_pipeline` test fixture - Remove `{try-notebook}` directive from run-status docs (no Python cells) - Document implicit headless tracker creation in `_create_progress_tracker` * fix: suppress google-crc32c pure-python fallback warning The warning fires during `import zarr` → numcodecs → google_crc32c and appears on every pipefunc import when the C extension is not compiled. Suppress it in `map/__init__.py` before importing zarr. * no del
Add concise __repr__ methods to ErrorSnapshot and PropagatedErrorSnap… …shot (#927) The default dataclass __repr__ shows all fields, making output very verbose when these objects appear in arrays or other containers. Add custom __repr__ methods that show only essential information: - ErrorSnapshot: function name, exception type and message - PropagatedErrorSnapshot: function name and reason The detailed __str__ output remains unchanged for when users explicitly print() these objects.
PreviousNext