pola-rs/polars: Python Polars 1.38.0

20260 citationsOthergreen Open Access

Authors

Ritchie Vink · Office of Polar Programs

Stijn de Gooijer

Alexander Beedie

nameexhaustion

Gijs Burghoorn · Radboud University Nijmegen

Orson R. L. Peters · Office of Polar Programs

Marco Gorelli · Quansight (United States)

reswqa

Marshall

Abstract

⚠️ Deprecations Deprecate retries=n in favor of storage_options={"max_retries": n} (#26155) 🚀 Performance improvements Enable zero-copy object_store put upload for IPC sink (#26288) Resolve file schema's and metadata concurrently (#26325) Run elementwise CSEE for the streaming engine (#26278) Disable morsel splitting for fast-count on streaming engine (#26245) Implement streaming decompression for scan_ndjson and scan_lines (#26200) Improve string slicing performance (#26206) Refactor scan_delta to use python dataset interface (#26190) Add dedicated kernel for group-by arg_max/arg_min (#26093) Add streaming merge-join (#25964) Generalize Bitmap::new_zeroed opt for Buffer::zeroed (#26142) Reduce fs stat calls in path expansion (#26173) Lower streaming group_by n_unique to unique().len() (#26109) ✨ Enhancements Avoid OOM for scan_ndjson and scan_lines if input is compressed and negative slice (#26396) Support annoymous agg in-mem (#26376) Add unstable arrow_schema parameter to sink_parquet (#26323) Improve error message formatting for structs (#26349) Remove parquet field overwrites (#26236) Enable zero-copy object_store put upload for IPC sink (#26288) Improved disambiguation for qualified wildcard columns in SQL projections (#26301) Expose upload_concurrency through env var (#26263) Allow quantile to compute multiple quantiles at once (#25516) Allow empty LazyFrame in LazyFrame.group_by(...).map_groups (#26275) Use delta file statistics for batch predicate pushdown (#26242) Add streaming UnorderedUnion (#26240) Implement compression support for sink_ndjson (#26212) Add unstable record batch statistics flags to {sink/scan}_ipc (#26254) Support CSE for python UDFs on the same address (#26253) Cloud retry/backoff configuration via storage_options (#26204) Use same sort order for expanded paths across local / cloud / directory / glob (#26191) Add streaming merge-join (#25964) Serialize optimization flags for cloud plan (#26168) Add compression support to write_csv and sink_csv (#26111) Add scan_lines (#26112) Support regex in str.split (#26060) Add unstable IPC Statistics read/write to scan_ipc/sink_ipc (#26079) Add unstable height parameter to DataFrame/LazyFrame (#26014) Remove old partition sink API (#26100) Expose ArrowStreamExportable on python collect batches iterator (#26074) Add nulls support for all rolling_by operations (#26081) 🐞 Bug fixes Correct off-by-one in RLE row counting for nullable dictionary-encoded columns (#26411) Support very large integers in env var limits (#26399) Fix PlPath panic from incorrect slicing of UTF8 boundaries (#26389) Fix Float dtype for spearman correlation (#26392) Fix optimizer panic in right joins with type coercion (#26365) Don't serialize retry config from local environment vars (#26289) Fix PartitionBy with scalar key expressions and diff() (#26370) Add {Float16, Float32} -> Float32 lossless upcast (#26373) Fix panic using with_columns and collect_all (#26366) Add multi-page support for writing dictionary-encoded Parquet columns (#26360) Ensure slice advancement when skipping non-inlinable values in is_in with inlinable needles (#26361) Pin xlsx2csv version temporarily (#26352) Bugs in ViewArray total_bytes_len (#26328) Overflow in i128::abs in Decimal fits check (#26341) Make Expr.hash on Categorical mapping-independent (#26340) Clone shared GroupBy node before mutation in physical plan creation (#26327) Fixed "sheet_name" typing for read_ods and read_excel (#26317) Improve Polars dtype inference from Python Union typing (#26303) Consider the "current location" of an item when computing rolling_rank_by (#26287) Reset is_count_star flag between queries in collect_all (#26256) Fix incorrect is_between filter on scan_parquet (#26284) Make polars compatible with ty (#26270) Lower AnonymousStreamingAgg in group-by as aggregate (#26258) Avoid overflow in pl.duration scalar arguments case (#26213) Broadcast arr.get on single array with multiple indices (#26219) Fix panic on CSPE with sorts (#26231) Eager DataFrame.slice with negative offset and length=None (#26215) Use correct schema side for streaming merge join lowering (#26218) Overflow panic in scan_csv with multiple files and skip_rows + n_rows larger than total row count (#26128) Respect allow_object flag after cache (#26196) Raise error on non-elementwise PartitionBy keys (#26194) Allow ordered categorical dictionary in scan_parquet (#26180) Allow excess bytes on IPC bitmap compressed length (#26176) Address a macOS-specific compile issue (#26172) Fix deadlock on hash_rows() of 0-width DataFrame (#26154) Fix NameError filtering pyarrow dataset (#26166) Fix concat_arr panic when using categoricals/enums (#26146) Fix NDJSON/scan_lines negative slice splitting with extremely long lines (#26132) Incorrect group_by min/max fast path (#26139) Remove a source of non-determinism from lowering (#26137) Error when with_row_index or unpivot create duplicate columns on a LazyFrame (#26107) Panics on shift with head (#26099) 📖 Documentation Fix Expr.get referencing incorrect dtype for index parameter (#26364) Fix Expr.quantile formatting (#26351) Drop sphinx-llms-txt extension (#26285) Remove deprecated cublet_id (#26260) Update for new release (#26255) Update MCP server section with new URL (#26241) Fix unmatched paren and punctuation in pandas migration guide (#26251) Add observatory database_path to docs (#26201) Note plugins in Python user-defined functions (#26138) 📦 Build system Address remaining Python 3.14 issues with make requirements-all (#26195) Address a macOS-specific compile issue (#26172) 🛠️ Other improvements Ensure local doctests skip from_torch if module not installed (#26405) Change linked timezones in test suite to canonical timezones (#26310) Implement various deprecations (#26314) Rename Operator::Divide to RustDivide (#26339) Properly disable the Pyodide tests (#26382) Remove unused field (#26367) Fix runtime nesting (#26359) Remove xlsx2csv dependency pin (#26355) Use outer runtime if exists in to_alp (#26353) Make CategoricalMapping::new pub(crate) to avoid misuse (#26308) Clarify IPC buffer read limit/length paramter (#26334) Add dtype test coverage for delta predicate filter (#26291) Add AI policy (#26286) Unpin "pandas<3" in dev dependencies (#26249) Remove all non CSV fast-count paths (#26233) Pin pandas to 2.x for now (#26221) Remove unnecessary xfail (#26199) Ensure optimization flag modification happens local (#26185) Simplify IcebergDataset (#26165) Reorganize unit tests into logical subdirectories (#26149) Lint leftover fixme (#26122) Improve backtrace for POLARS_PANIC_ON_ERR (#26125) Fix Python docs build (#26117) Disable unused-ignore mypy lint (#26110) Ignore mypy warning (#26105) Raise error on file://hostname/path (#26061) Disable debug info for docs workflow (#26086) Update docs for next polars cloud release (#26091) Support Python 3.14 in dev environment (#26073) Thank you to all our contributors for making this release possible! @Atarust, @EndPositive, @Kevin-Patyk, @LeeviLindgren, @MarcoGorelli, @Matt711, @MrAttoAttoAtto, @Voultapher, @WaffleLapkin, @agossard, @alex-gregory-ds, @alexander-beedie, @azimafroozeh, @bayoumi17m, @c-peters, @carnarez, @dependabot[bot], @dsprenkels, @hallmason17, @hamdanal, @ion-elgreco, @kdn36, @lun3x, @mcrumiller, @nameexhaustion, @orlp, @qxzcode, @r-brink, @ritchie46, @sweb and dependabot[bot]

Topics & Keywords

Publication Details

Published in: Zenodo (CERN European Organization for Nuclear Research)

DOI: 10.5281/zenodo.18481931

Command Palette

pola-rs/polars: Python Polars 1.38.0

Authors

Abstract

Topics & Keywords

Publication Details