Releases: ibis-project/ibis
10.0.0
10.0.0 (2025-02-06)
⚠ BREAKING CHANGES
- api: change
as_intervalunitargument to be positional-only - api: change
as_timestampunitargument to be positional-only - api: standardize unnest and pivot_longer signatures
- api: remove deprecated
Table.relabelmethod - api: standardize
StringValuemethod signatures - api: standardize
NumericValuemethods - api: make
GeoSpatialValue.containspositional-only - api: make
Table.describequantileargument keyword-only - api: remove deprecated
Table.relabelmethod - api: make
Table.drop_null/Table.fill_null/Table.window_by/Table.aliasargument positional-only - api: make
Table.samplefractionargument positional-only - api: make
Table.aggregatemetricsargument positional-only - api: make
Tableset operation methods positional-only - api: make
Table.castandTable.try_castmethods positional-only - api: make
nthpositional-only - api: make
isin/notin/cases/identical_topositional-only - api: make null-related methods and
nullfunction positional-only - api: make
Value.castandValue.try_castpositional-only - internals: make
Value.namepositional-only - internals: make
Expr.pipepositional-only - internals: make
Expr.equalspositional-only - api: align signatures of
to_jsonmethods - api: align signatures of
to_deltamethods - api: align signatures of
to_csv/to_csv_dirmethods - api: align signatures of
to_parquet/to_parquet_dirmethods - api: align
.sqlmethod signatures across polars and sql as well as theTablemethod - api: top-level
connectmethod now takes its first argument as positional-only - duckdb: align signatures of
read_sqlite/read_mysql/read_postgresmethods in the duckdb backend - api: align signatures of
read_deltamethod; sources are positional-only, everything else is required-keyword - api: canonicalize
has_operationbackend method; single argument is positional-only - api: canonicalize
read_kafkaandto_kafkamethods of the PySpark backend - api: canonicalize
drop_table_or_viewmethod of the impala backend - api: canonicalize
to_geosignature of the the DuckDB backend - api: canonicalize
read_geosignature of the the DuckDB backend - api: align signatures of
list_catalogs;like` argument is now keyword-only - bigquery: canonicalize
set_databasesignature - api: make
list_databasesarguments all required-keyword - risingwave: canonicalize signatures of risingwave-specific
create_*methods - polars: canonicalize signature of
read_pandasmethod - api: align signatures of
drop_tablemethod;nameis positional-only; the rest are keyword-only - api: align signatures of
create_cataloganddrop_catalogmethods;nameis positional-only; the rest are keyword-only - api:
compilemethod is now the same across backends - api: align signatures of
create_tablemethod;nameis positional-only;objis positional-or-keyword; the rest are keyword-only - api: align signatures of
create_viewmethod;nameis positional-only;objis positional-or-keyword; the rest are keyword-only - api: align signatures of
drop_viewmethod;nameis positional-only; the rest are keyword-only - api: align signatures of
truncate_tablemethod;nameis positional-only; the rest are keyword-only - api: align signatures of
insertmethod;nameis positional-only;objis positional-or-keyword; the rest are keyword-only - api: align signatures of
read_jsonmethod; sources are positional-only, everything else is required-keyword - api: align signatures of
read_csvmethod; sources are positional-only, everything else is required-keyword - api: align signatures of
read_parquetmethod; sources are positional-only, everything else is required-keyword - api: align signatures of
to_torchmethod - api: align signatures of
to_polarsmethod - api: align signatures of
Backend.list_tablesmethod; all arguments are now keyword-only - api: align signatures of
Backend.tablemethod;nameis positional-only; everything else is required-keyword - api: align signatures of
create_databaseanddrop_database;nameis positional-only; everything else is required-keyword - api: standardize
MapValuemethod signatures - api: standardize
ArrayValuemethod signatures - api:
typeargument ofstructfunction is now required-keyword - api: standardize
TemporalValueAPIs - api:
whereargument of aggregate functions is now required-keyword - api:
hashbytesandhexdigestare now positional-only - api: standardize
howargument tojoinmethods as keyword-only and standardize remaining arguments - api:
ibis.coalesce/ibis.greatest/ibis.leastare now positional-only - api:
Expr.ifelseis now positional-only - api: top-level set operation functions are now positional-only
- api:
set_backendandget_backendfunctions are now positional-only - api:
ntilefunction and method is now positional-only - api: ibis.preceding
/ibis.following` are now positional-only - api:
exprargument ofibis.asc/ibis.descis now positional-only;nulls_firstis keyword-only - api:
dataargument ofibis.memtableis now positional-only; the rest are keyword-only - api:
pairsargument ofibis.schemais now positional-only; the rest are keyword-only - api:
ibis.paramis now positional-only - api:
nargument inTable.limitandTable.headis now required-positional - api:
offsetargument inTable.limitis now required-keyword - api: temporal window expression APIs now require all arguments as keywords
- api:
to_pyarrowandto_pyarrow_batchesrequiresexpras positional-only and keyword for everything else - api:
to_pandas_batchesrequiresexpras positional-only - api:
executeandto_pandasmethods now requireexpras positional-only - api:
distanceis now a required keyword argument for thed_withinapi - duckdb: The duckdb backend's
read_csvmethod accepts only DuckDB types for the values components of thecolumnsandtypesarguments. You may need need to adjust existing code. For example, the string"float64"should be replaced with the string"double". - duckdb: The
read_in_memorymethod is removed from the duckdb backend. Useibis.memtableinstead. - api: The
howparameter of theValue.arbitrarymethod is removed. callValue.firstorValue.lastexplicitly - api: The
StringValue.initcapmethod is removed. UseStringValue.capitalizeinstead. - api:
IntegerValue.labelis redundant with theIntegerValue.casesmethod, use that instead. Replaceexpr.label(labels)withexpr.cases(*enumerate(labels)) - register: The deprecated
registermethod has been removed. Please use the file-specificread_*methods instead. For in-memory objects, pass them toibis.memtableorcreate_table. - duckdb: Special handling of the
temp_directoryargument passed to Ibis is removed in favor of passing the argument through directly toduckdb.connect. Interior nodes of directory trees must be created, e.g., usingPath.mkdir(exists_ok=True, parents=True),mkdir -p, etc. - config:
option_contextis removed. Usecontextlib.contextmanagerto create your own version of this functionality if necessary. - duckdb: The DuckDB lower bound has been bumped to a version that has storage backwards compatibility. You may need to migrate your DuckDB database files.
- api:
has_namehas always returnedTruesince 9.0. It is safe to remove any calls tohas_name. - backends:
executenow returns non-numpy objects for scalar values. - api:
ibis.negateis removed. Use thenegatemethod on a
specific column, instead. - api: All
ibis.geo_*functions are removed. Equivalent
methods are available on all geo columns. - api:
whereis removed. Useibis.ifelseinstead. - value:
Value.greatestandValue.leastare removed. Use
ibis.greatestandibis.least, instead. - joins: Passing a
pyarrow.Tableor apandas.DataFrameas
the right-hand-side of a join is no longer supported.
To join against in-memory data, you can pass the in-memory object to
ibis.memtable or con.create_table and use the resulting table object
instead.
Issues closed
-
api: Removed hierarchical usage of schema.
Ibis uses the following naming conventions:- schema: a mapping of column names to datatypes
- database: a collection of tables
- catalog: a collection of databases
-
mysql: Ibis now uses the
MySQLdbdriver. You may need to install MySQL client libraries to build the extension. -
padding: String padding operations now follow Python semantics and leave strings greater than the padding length untouched.
-
pandas: The
pandasbackend is removed. Note that pandas DataFrames are STILL VALID INPUTS AND OUTPUTS and will remain so for the foreseeable future. Please use one of the other local backends like DuckDB, Polars, or DataFusion to perform operations directly on pandas DataFrames. -
dask: The
daskbackend is removed. Please use one of the
other backends that Ibis supports. -
api: remove deprecated
wheremethodism (886b2d1) -
api: remove top-level
negatefunction (c8c37dd) -
api: remove top-level geo functions ([6b18...
9.5.0
9.5.0 (2024-09-11)
Features
- api: add
nameargument totopk(1652076) - api: add
nameargument tovalue_counts(24be184) - api: add
to_sqlglotmethod toSchemaobjects (#10063) (9488115) - mssql: add lpad and rpad ops (#10060) (77af14b)
- mssql: add startswith and endswith ops (17a628c)
Bug Fixes
- backends: pass kwargs to _from_url() in every case (#10003) (9ca92f0)
- bigquery: handle column name mismatches and
_TABLE_SUFFIXeverywhere (5ade49e) - clickhouse: fix lstrip, rstrip, and strip (d2539c4)
- datafusion: raise when attempting to create temp table (#10072) (1cf5439)
- deps: update dependency fsspec to <2024.9.1 (#10036) (ea71719)
- deps: update dependency sqlglot to >=23.4,<25.20 (#10010) (ba07da7)
- deps: update dependency sqlglot to >=23.4,<25.21 (#10050) (422d361)
- docs: update invalid read_parquet link (2ae9ef4)
- duckdb: allow setting
auto_detecttoFalseby fixing translation of columns argument (#10065) (883d2d3) - duckdb: free memtables based on operation lifetime (#10042) (a121ab3)
- duckdb: support version 1.1.0 (#10037) (3a37626)
- flink: fix strip (01117a5)
- impala: allow specifying
temp=Falseincreate_table(e29712c) - impala: fix lstrip, rstrip, strip (413df3b)
- mssql: ensure that dot-sql can be executed when column names are not provided (#10028) (1936437), closes #10025
- mssql: fix strip, lstrip, rstrip (f53feab)
- oracle: fix lstrip, rstrip, and strip (3f5a304)
- pandas: don't silently ignore result column name mismatches (48be246)
- polars: support polars
Enumtype (#10017) (869829f) - sqlite: list temporary tables by default (#10058) (dfa55b6)
- sql: properly parenthesize binary ops containing named expressions (5c2eadc)
Documentation
- accursed: add cursed knowledge page (#10031) (85e1dcc)
- duckdb: fix broken link to parquet writing (#10026) (d22f8eb)
- jupyterlite: disable insecure extensions (#10052) (3d8280b)
Refactors
- backends: clean up resources produced by
memtable(#10055) (019cae5) - backends: split memtable existence check out (#10053) (77448bf)
- datafusion: avoid reinitializing memtables on every execute call (#10057) (43e5f12)
- dependencies: make
fsspeca test-only dependency (37e4439) - formats: plumb through
data_mapperandschemain both pandas and pyarrow formats (cbeb967) - mssql: simplify lpad and rpad ops (#10085) (ef5d58d), closes /github.com/ibis-project/ibis/pull/10060#discussion_r1752665235
- polars: handle memtables like every other backend (#10056) (2b0dbb9)
Performance
- backends: speed up most memtable existence checks (#10067) (a205ab7)
- ir: don't recreate nodes in
replaceif their children haven't changed (ac79604) - sql: avoid parenthesizing chains of commutative operators (f86515c)
Deprecations
9.4.0
9.4.0 (2024-09-03)
Features
- api: add
approx_quantilesfor computing approximate quantiles (dcdb7a7) - api: add
DateValue.epochapi for computing days since epoch (#9856) (8b0fb66) - api: make the
nullfunction deferrable (0613ef1) - api: support
SchemaLikeinBackend.create_table()(#9885) (949fbea) - api: support deferred objects in
literal(#9904) (0a07906) - clickhouse: partition kwargs for compile and execution in
to_pyarrowandto_pandas(2dd2c3f) - clickhouse: support ms/us/ns truncate units (9881edb)
- decompile: make the decompiler run on TPCH query 1 (#9779) (0268044)
- exasol: implement
approx_nunique,std,var(d9c3daa) - exasol: implement
approx_nunique,std,var(63c20c0) - exasol: implement
cov/corr(24f41b2) - exasol: implement
medianandapprox_median(3cfc344) - exasol: implement
quantile(ecbef94) - exasol: implement
Table.nunique(a24200c) - exasol: implement
Table.nunique(7ead7c7) - flink: array sort (ca85ae2)
- flink: support
ArrayValue.collect(eb857e6) - impala: add
tbl_propertiestocreate_table(#9839) (e3d02bd) - mssql: support connecting with a url (#9894) (8bb12e1), closes #9856
- oracle: implement
modeaggregation (#9914) (9ee910d) - output-formats: add support for to_parquet_dir (#9781) (80dfbe2)
- polars: array sort (9a2563b)
- polars: implement approx_nunique (3f3738d)
- pyspark: support
quantile(26d8516) - selectors: support naming deferreds in across (de1595c)
- snowflake: implement interval arithmetic (#9794) (41e10ca), closes #9783
- sql: enable cross-database joins (#9849) (c3ff6ae)
- sql: fuse
distinctwith other select nodes when possible (c31412b) - sqlite: support most date/timestamp interval arithmetic (75f594d)
- sql: load parsed but unsupported types as unknown (#9868) (a76acfc)
- sql: support inserts with default constraints (#9844) (86a3c06)
- timestamps: add support for timestamp/date +/- intervals for additional backends (#9799) (79cef68)
- trino: support years and months in datetime arithmetic (1133973)
- trino: wrap
authstrings withBasicAuthentication(#9960) (e0f54c9)
Bug Fixes
- bigquery: disallow column names longer than 300 characters (#9916) (ea97794), closes #8931
- clickhouse: workaround
EXCEPTandINTERSECTgeneration in sqlglot; add tpcds query 87 (#9959) (910b8f5) - datafusion: fix creation of SessionContext in datafusion 40.1.0 (eec5328)
- datafusion: handle
NULLs in arrayflatten(ecc199f) - deps: update dependency datafusion to v40 (4aa402a)
- deps: update dependency sqlglot to >=23.4,<25.11 (#9805) (84bfeb5)
- deps: update dependency sqlglot to >=23.4,<25.12 (#9834) (69a10d9)
- deps: update dependency sqlglot to >=23.4,<25.13 (#9851) (6780a6b)
- deps: update dependency sqlglot to >=23.4,<25.15 (#9864) (d182e9e)
- deps: update dependency sqlglot to >=23.4,<25.16 (#9875) (0a6765b)
- deps: update dependency sqlglot to >=23.4,<25.17 (#9907) (9e52edb)
- deps: update dependency sqlglot to >=23.4,<25.18 (#9935) (ee5116d)
- deps: update dependency sqlglot to >=23.4,<25.19 (#9962) (4c136d8)
- dot-sql: ensure that CTEs can be used in
.sql(b63e0fd) - duckdb: fix create_table() in databases with spaces in the name (#9817) (9da3c9f)
- exasol: properly handle returning BIGINT values (e20bdad)
- ir: convert analytic functions to window functions in filters (31295dd)
- mssql: remove sort key to keep order (#9848) (3780a13)
- mssql: support
.cache()for caching tables (1de2f45) - oracle: avoid double cursor closing by removing unnecessary
close...
9.3.0
9.3.0 (2024-08-07)
Features
- api: support
ignore_nullincollect(71271dd) - api: support
ignore_nullinfirst/last(8d4f97f) - api: support
order_byin order-sensitive aggregates (collect/group_concat/first/last) (#9729) (a18cb5d) - api: support quarterly truncation (#9715) (75b31c2), closes #9714
- array: implement min, max, any, all, sum, mean (#9704) (793efbc)
- bigquery: support timestamp bucket (fd61f2c)
- datafusion:
pivot_longer(2330b0c) - datafusion: enable array flatten, group concat, and timestamp now (4d110a0)
- datafusion: struct literals (a63cee9)
- datafusion: unnest (a706f54)
- duckdb: add support for passing a subset of column types to
read_csv(#9776) (c1dcf67) - duckdb: support arbitrary url prefixes (#9691) (11af489)
- mssql: support case-sensitive collations (#9700) (9382a0e)
- oracle: support group_concat operator (47d97ea)
- pyspark: add support for pyarrow and python UDFs (#9753) (02a1d48)
- snowflake: add
userinfoURL parsing (524a2fa) - ux: allow window functions in predicates and compile to
QUALIFYwhere possible (#9787) (0370bcb)
Bug Fixes
- algolia: add parent class docstring to algolia index (#9739) (3bc9799)
- bigquery: repr geospatial values in interactive mode (#9712) (bd8c93f)
- case: fix dshape, error on noncomparable and empty cases (#9559) (ff2d019)
- compiler-internals: define unsupported operations after simple operations (#9755) (d9b6264)
- deps: update dependency atpublic to v5 (#9697) (a4f3940)
- deps: update dependency sqlglot to >=23.4,<25.10 (#9774) (7144257)
- deps: update dependency sqlglot to >=23.4,<25.8 (#9696) (d4a2ea2)
- deps: update dependency sqlglot to >=23.4,<25.9 (#9719) (b1d8b2e)
- drop: ignore order for
DropColumnsequality (#9677) (ae1e112) - druid: get basic timestamp functionality working (#9692) (6cd3eee)
- duckdb: avoid literals casts that might defeat optimization (e4ff1bd)
- duckdb: ensure that array remove doesn't remove
NULLs (f0c3be4) - duckdb: use
registerdirectly instead of callingread_in_memory(597817f) - internals: ensure that CTEs are emitted in topological order (#9726) (acd7d82)
- polars: fix polars
std/varto properly handlesample/population(f83d84f) - polars: remove bogus minus-one-week truncation (ac519b2)
- postgres: handle enums by delegating to the parent class (#9769) (3f01075), closes #9295
- snowflake: bring back
wherefilter support ingroup_concat; fixarray_aggordering (#9758) (6e7e4de) - sql: only return tables in
current_database(#9748) (c7f5717) - types: fix histogram bin allocation (#9711) (6634864), closes #9687
Documentation
- algolia: add custom attributes to backend and core methods (#9730) (d9473cf)
- browser-repl: fix jupyterlite build (#9762) (f403aa1)
- fix spelling in pivot_longer explanation (#9780) (3201d8b)
- fix typo in
dropmethod docstring (#9727) (4cf0014) - presentations: update overview slides (#9685) (d3a2c0c)
- replace all double graves with single graves (#9679) (dd26d60)
Refactors
- dependencies: pandas and numpy are now optional for non-backend installs (#9564) (cff210a)
- duckdb: use replace to generate less sql (#9713) (f89aa32)
- internals: remove unnecessary dynamism in
dropmethod (#9682) (5ac84c5) - pandas: remove unreachable code in pandas backend (#9786) (dc6bfe2)
- polars: delete some dead versioning code (b23c5a3)
- polars: remove casting where possible; handle conversion on output (#9673) (8717629)
- polars: remove extra backwards co...
9.2.0
9.2.0 (2024-07-22)
Features
- api: accept more input types in
ibis.range(#9659) (310ad30) - api: add
nulls_first=Falseargument toorder_by(#9385) (ce9011e) - api: add
TableUnnestoperation to support cross-join unnest semantics as well asoffset(#9423) (3352a84) - api: add positional joins (#9533) (85ea9da)
- api: allow grouping by scalar values (#9451) (14f1821)
- api: support deferred or string column names in
cov/corrmethods (#9657) (4d135b3) - api: support selectors in window function
order_byandgroup_by(#9649) (0ad47de) - backends: support creation from a DB-API con (#9603) (fc4d1e3)
- bigquery: implement CountDistinctStar (#9470) (273e4bc)
- caching: tie lifetime of cached tables to python refs (#9477) (f51546e)
- datafusion: datafusion enhancements (#9544) (f11ca43)
- dtypes: fall back to
dt.unknownfor unknown types (#9567) (6e0b5f5) - dtypes: fall back to
dt.unknownfor unknown types (#9576) (56a10d2) - duckdb: use
delta_scaninstead of reading pyarrow datasets (#9566) (0ff595e) - flink: create views from more mem data types (#9622) (b83fc2b)
- geospatial: use geoarrow extension types when returning geometry columns as pyarrow (#9549) (cba7367)
- polars: add more accurate type mapping for timestamps (#8954) (3eafac4)
- polars: support version 1.0 and later (#9516) (62a1864)
- postgres: support basic jsonb type and existing operations (#9630) (7179cc6)
- pyarrow: support
__arrow_c_schema__onibis.Schemaobjects (#9665) (00a776e) - pyspark: implement new experimental read/write directory methods (#9272) (adade5e)
Bug Fixes
- api: add support for using deferreds in the
argmin/argmaxkeyargument (#9652) (3f05cbc) - bigquery: escape table names with spaces for bigquery backend (#9589) (ca21dbb)
- bigquery: support microseconds in time literals (#9610) (c876abc), closes #9609
- clickhouse: generate redundant aliases to workaround clickhouse naming behavior (#9525) (b44dac2), closes #9508
- clickhouse: support
Date32database type (#9509) (efa6fb7) - datatypes: proper handling of srid in geospatial datatypes (#9519) (a3ceb59)
- deps: update dependency datafusion to v39 (#9506) (21ef0a6)
- deps: update dependency fsspec to <2024.6.2 (#9463) (8e225ec)
- deps: update dependency geopandas to v1 (#9437) (fa1037b)
- deps: update dependency numpy to v2 (#9395) (3cb39a5)
- deps: update dependency pyarrow to v17 (#9614) (16998df)
- deps: update dependency sqlglot to >=23.4,<25.3 (#9401) (bdc1b3f)
- deps: update dependency sqlglot to >=23.4,<25.4 (#9427) (8e015b6)
- deps: update dependency sqlglot to >=23.4,<25.5 (#9472) (f6f80da)
- deps: update dependency sqlglot to >=23.4,<25.6 (#9523) (6a748c4)
- deps: update dependency sqlglot to >=23.4,<25.7 (#9628) (f5207ff)
- druid: handle typed nulls where possible (#9452) (33ec754)
- fix and improve shape inference in many ops (7a0b21e)
- ir: avoid deduplicating filters based solely on their name (#9476) (b35582e), closes #9474
- ir: repr iterables when constructing name of operations (#9480) (f5a541c)
- join: skip substitution of non-field references in join chains (#9595) (61ef0ed)
- mssql: always pass port to
pyodbcin host string (#9656) (2e3fd9a) - mssql: avoid calling
.commit()unless a DDL operation is being performed (#9658) (69c5bf0), closes #9654 - mssql: fix temporary table creation and implement
cache(#9434) ([196d8a...
9.1.0
9.1.0 (2024-06-13)
Features
- all: enable passing in-memory data to create_table (#9251) (fa15c7d), closes #6593 #8863
- api: add
Table.value_countsfor easy group by count on multiple fields (aba913d) - api: isoyear method (#9034) (4707c44)
- api: support
typearg to ibis.null() (8db686e) - api: support wider range of types in
wherearg to column reductions (582165f) - api: support wider range of types in
wherearg to table reductions (7aba385) - bigquery: implement a few URL ops (#9210) (3d0f9bc)
- bigquery: support filtering by
_TABLE_SUFFIXwhen using a wildcard table name (#9375) (62a25c4), closes #9371 - datafusion: use pyarrow for type conversion (#9299) (5bef96a)
- drop Python 3.9 and test on Python 3.10/3.12 (#9213) (c06285e)
- duckdb: add catalog support to create_table (#9147) (07331b5)
- duckdb: allow to use named in-memory db (#9241) (67460aa), closes #9240
- duckdb: support and test 1.0 (#9297) (395c8b5)
- pandas,dask: implement ops.StructColumn (#9302) (ea81d85)
- polars: accept list of CSVs to read_csv (#9232) (7a272e3), closes #9230
- polars: implement
create_view/drop_view/drop_table(#9263) (c4324f5) - postgres: provide translation for
hashops (#9348) (57e2348) - pyarrow: support Arrow PyCapsule interface on
ibis.Tableobjects (1a262b9) - pyspark: builtin udf support (#9191) (142c105)
- pyspark: provide a mode option to manage both batch and streaming connections (e425ad5)
- pyspark: support reading from and writing to Kafka (#9266) (1c7c6e3)
- selectors: parse Python types in
s.of_type(#9356) (c0ebdc8) - snowflake: implement array map and array filter (#9178) (9b42751)
- snowflake: implement support for
asof_joinAPI (#9180) (49c6ce3) - snowflake: implement Table.sample (#9071) (307334b)
- ux: improve error message on unequal schemas during set ops (#9115) (5488896)
Bug Fixes
- api: treat
col == Noneorcol == ibis.NAascol.isnull()(#9114) (711bf9f) - bigquery: only register memtable if obj is not None (#9268) (f175d0a)
- bigquery: quote all parts of table names (#9141) (e1338d5)
- bigquery: quote qualified memtable names (#9149) (878d0d5)
- bigquery: strip whitespace from bigquery field names (#9160) (8e5cc3b), closes #9112
- clickhouse: more explicitly disallow null structs (#9305) (fc1d00f)
- convert the uint64's from some backends' hash() to the desired int64 (900ecca)
- datatypes: manually cast the type of
postoint16fortable.info()(#9139) (9eb1ed1) - datatypes: manually cast the type of pos to int16 for
table.describe()(#9314) (c7fcddf) - ddl: use column names, not position, for insertion order (#9264) (3506f40)
- deps: remove pydruid sqlalchemy dependency (#9092) (a0df103)
- deps: update dependency datafusion to v37 (#9189) (49ecf8d)
- deps: update dependency datafusion to v38 (#9278) (77aaecd)
- deps: update dependency fsspec to <2024.5.1 (#9201) (15a5257)
- deps: update dependency fsspec to <2024.6.1 (#9304) (d600a0d)
- deps: update dependency sqlglot to >=23.4,<23.14 (#9118) (d8119fb)
- deps: update dependency sqlglot to >=23.4,<23.15 (#9151) (ac2201d)
- deps: update dependency sqlglot to >=23.4,<23.17 (#9209) (82a5f93)
- deps: update dependency sqlglot to >=23.4,<23.18 (#9212) (b92dd7b)
- deps: update dependency sqlglot to >=23.4,<24.2 (#9277) (98cb460)
- deps: update dependency sqlglot to >=23.4,<25.2 ([#9368](htt...
9.0.0
9.0.0 (2024-04-30)
⚠ BREAKING CHANGES
- udf: The
schemaparameter for UDF definition has been removed. A newcatalogparameter has been added. Ibis uses the word database to refer to a collection of tables, and the word catalog to refer to a collection of databases. You can use a combination ofcataloganddatabaseto specify a hierarchical location for the UDF. - pyspark: Arguments to
create_database,drop_database, andget_schemaare now keyword-only except for thenameargs. Calls to these functions that have relied on positional argument ordering need to be updated. - dask: the dask backend no longer supports
cov/corrwithhow="pop". - duckdb: Calling the
getorcontainsmethod onNULLmap
values now returnsNULL. Usecoalesce(map.get(...), default)or
coalesce(map.contains(), False)to get the previous behavior. - api: Integer inputs to
selectandmutateare now always interpreted as literals. Columns can still be accessed by their integer index using square-bracket syntax. - api: strings passed to table.mutate() are now interpreted as
column references instead of literals, useibis.literal(string)to
pass the string as a literal - ir:
Schema.apply_to()is removed, useibis.formats.pandas.PandasConverter.convert_frame()instead - ddl: We are removing the word
schemain its hierarchical
sense. We usedatabaseto mean a collection of tables. The behavior of
all*_databasemethods now applies only to collections of tables and
never to collections ofdatabase(formerlyschema) CanListDatabasesabstract methods now all refer to
collections of tables.CanCreateDatabasesabstract methods now all refer to
collections of tables.list_databasesnow takes a kwargcatalog.create_databasenow takes a kwargcatalog.drop_databasenow takes a kwargcatalog.current_databasenow refers to the current collection of tables.CanCreateSchemais deprecated andcreate_schema,drop_schema,
list_schemas, andcurrent_schemaare deprecated and redirect to the
corresponding method/property ending indatabase.- We add a
CanListCatalogandCanCreateCatalogthat can list and
create collections ofdatabase, respectively.
The new methods arelist_catalogs,create_catalog,drop_catalog, - There is a new
current_catalogproperty. - api: timecontext feature is removed
- api: The
byargument fromasof_joinis removed. Calls toasof_jointhat previously usedbyshould pass those arguments topredicatesinstead. - cleanup: Deprecated methods and properties
op,output_dtype, andoutput_shapeare removed.opis no longer needed, and use.dtypeand.shaperespectively for the other two. - api: expr.topk(...) now includes null counts. The row count of the topk call will not differ, but the number of nulls counted will no longer be zero. To drop the null row use the dropna method.
- api:
ibis.rows_with_max_lookback()function andibis.window(max_lookback)argument are removed - strings: Backends that previously used initcap (analogous to str.title) to implement StringValue.capitalize() will produce different results when the input string contains multiple words (a word's definition being backend-specific).
- impala: Impala UDFs no longer require explicit registration. Remove any calls to
Function.register. If you were passingdatabasetoFunction.register, pass that toscalar_functionoraggregate_functionas appropriate. - pandas: the
timecontextfeature is not supported anymore - api:
onparemater oftable.asof_join()is now only
accept a single predicate, usepredicatesto supply additional
join predicates.
Features
- add to_date function to StringValue (#9030) (0701978), closes #8908
- api: add
.as_scalar()method for turning expressions into scalar subqueries (#8350) (8130169) - api: add
cataloganddatabasekwargs toibis.table(#8801) (7d593c4) - api: add
describemethod to compute summary stats of table expressions (#8739) (c8d98a1) - api: add
ibis.today()for retrieving the current date (#8664) (5e10d17) - api: add a
to_polars()method for returning query results aspolarsobjects (53454c1) - api: add a
uuidfunction for returning a new uuid (#8438) (965b6d9) - api: add API for unwrapping JSON values into backend-native values (#8958) (aebb5cf)
- api: add disconnect method (#8341) (32665af), closes #5940
- api: allow *arg syntax with GroupedTable methods (#8923) (489bb89)
- api: count nulls with topk (#8531) (54c2c70)
- api: expose common types in the top-level
ibisnamespace (#9008) (3f3ed27), closes #8717 - api: include bad type in NotImplementedError (#8291) (36da06b)
- api: natively support polars dataframes in
ibis.memtable(464bebc) - api: support
Table.order_by(*keys)(6ade4e9) - api: support all dtypes in MapGet and MapContains (#8648) (401e0a4)
- api: support converting ibis types & schemas to/from polars types & schemas (73add93)
- api: support Deferreds in Array.map and .filter (#8267) (8289d2c)
- api: support the inner join convenience to not repeat fields known to be equal (#8127) (798088d)
- api: support variadic arguments on
Table.group_by()(#8546) (665bc4f) - backends: introducing ibish the infinite scale backend you always wanted (#8785) (1d51243)
- bigquery: support polars memtables (26d103d)
- common: add
Dispatchedbase class for convenient visitor pattern implementation (f80c5b3) - common: add
Node.find_below()methods to exclude the root node from filtering (#8861) (80d12a2) - common: add a memory efficient
Node.map()implementation (e3f2217) - common: also traverse nodes used as dictionary keys (#9041) (02c6607)
- common: introduce
FrozenOrderedDict(#9081) (f926995), closes #9063 - datafusion, flink, mssql: add uuid operation (#8545) (2f85a42)
- datafusion: add array and strings functions ([#...
8.0.0
8.0.0 (2024-02-05)
⚠ BREAKING CHANGES
- backends: Columns with Ibis
datetypes are now returned as object dtype containingdatetime.dateobjects when executing with the pandas backend. - impala: Direct HDFS integration is removed and support for ingesting pandas DataFrames directly is as well. The Impala backend still works with HDFS, but data in HDFS must be managed outside of ibis.
- api: replace
ibis.show_sql(expr)calls withprint(ibis.to_sql(expr))or if using Jupyter or IPythonibis.to_sql(expr) - bigquery:
nullifzerois removed; usenullif(0)instead - bigquery:
zeroifnullis removed; usefillna(0)instead - bigquery:
list_databasesis removed; uselist_schemasinstead - bigquery: the bigquery
current_databasemethod returns thedata_projectinstead of thedataset_id. Usecurrent_schemato retrievedataset_id. To explicitly list tables in a given project and dataset, you can usef"{con.current_database}.{con.current_schema}"
Features
- api: define
RegexSplitoperation andre_splitAPI (07beaed) - api: support median and quantile on more types (#7810) (49c75a8)
- clickhouse: implement
RegexSplit(e3c507e) - datafusion: implement
ops.RegexSplitusing pyarrow UDF (37b6b7f) - datafusion: set ops (37abea9)
- datatypes: add decimal and basic geospatial support to the sqlglot type parser/generator (59783b9)
- datatypes: make intervals round trip through sqlglot type mapper (d22f97a)
- duckdb-geospatial: add support for flipping coordinates (d47088b)
- duckdb-geospatial: enable use of literals (23ad256)
- duckdb: implement
RegexSplit(229a1f4) - examples: add
zonesgeojson example (#8040) (2d562b7), closes #7958 - flink: add new temporal operators (dfef418)
- flink: add primary key support (da04679)
- flink: export result to pyarrow (9566263)
- flink: implement array operators (#7951) (80e13b4)
- flink: implement struct field, clean up literal, and adjust timecontext test markers (#7997) (2d5e108)
- impala: rudimentary date support (d4bcf7b)
- mssql: add hashbytes and test for binary output hash fns (#8107) (91f60cd), closes #8082 #8082
- mssql: use odbc (f03ad0c)
- polars: implement
ops.RegexSplitusing pyarrow UDF (a3bed10) - postgres: implement
RegexSplit(c955b6a) - pyspark: implement
RegexSplit(cfe0329) - risingwave: init impl for Risingwave (#7954) (351747a), closes #8038
- snowflake: implement
RegexSplit(2c1a726) - snowflake: implement insert method (2162e3f)
- trino: implement
RegexSplit(9d1295f)
Bug Fixes
- api: deferred values are not truthy (00b3ece)
- backends: ensure that returned date results are actually proper date values (0626fb2)
- backends: preserve
order_byposition in window function when subsequent expressions are duplicated (#7943) (89056b9), closes #7940 - common: do not convert callables to resolveable objects (9963705)
- datafusion: work around lack of support for uppercase units in intervals (ebb6cde)
- datatypes: ensure that array construction supports literals and infers their shape from its inputs (#8049) (899dce1), closes #8022
- datatypes: fix bad references in
to_numpy()(6fd4550) - deps: remove
filelockfrom required dependencies (76dded5) - deps: update dependency black to v24 (425f7b1)
- deps: update dependency datafusion to v34 (601f889)
- deps: update dependency datafusion to v35 (#8224) (a34af25)
- deps: update dependency oracledb to v2 (e7419ca)
- deps: update dependency pyarrow to v15 (ef6a9bd)
- deps: update dependency pyodbc to v5 (32044ea)
- docs: surround executable code blocks with interactive mode on/off (4c660e0)
- duckdb: allow table creation from expr with geospatial datatypes (#7818) (ecac322)
- duckdb: ensure that casting to floating point values produces valid types in generated sql (424b206)
- examples: use anonymous access when reading example data from GCS (8e5c0af)
- impala: generate memtables using
UNION ALLto work around sqlglot bug (399a5ef) - mutate/select: ensure that unsplatted dictionaries work in
mutateandselectAPIs (#8014) (8ed19ea), closes #8013 - mysql: catch PyMySQL OperationalError exception (#7919) (f2c2664), closes #6010 #7918
- pandas: support non-string categorical columns (5de08c7)
- polars: avoid using unnecessary subquery for schema inference (0f43667)
- **p...
7.2.0
7.2.0 (2023-12-18)
Features
- api: add
ArrayValue.flattenmethod and operation (e6e995c) - api: add
ibis.rangefunction for generating sequences (f5a0a5a) - api: add timestamp range (c567fe0)
- base: add
to_pandasmethod to BaseBackend (3d1cf66) - clickhouse: implement array flatten support (d15c6e6)
- common:
node.replace()now supports mappings for quick lookup-like substitutions (bbc93c7) - common: add
node.find_topmost()method to locate matching nodes without descending further to their children (15acf7d) - common: allow matching on dictionaries in possibly nested patterns (1d314f7)
- common: expose
node.__children__property to access the flattened list of children of a node (2e91476) - duckdb: add initial support for geospatial functions (65f496c)
- duckdb: add read_geo function (b19a8ce)
- duckdb: enforce aswkb for projections, coerce to geopandas (33327dc)
- duckdb: implement array flatten support (0a0eecc)
- exasol: add exasol backend (295903d)
- export: allow passing keyword arguments to PyArrow
ParquetWriterandCSVWriter(40558fd) - flink: implement nested schema support (057fabc)
- flink: implement windowed computations (256767f)
- geospatial: add support for GeoTransform on duckdb (ec533c1)
- geospatial: update read_geo to support url (3baf509)
- pandas/dask: implement flatten (c2e8d9d)
- polars: add
streamingkwarg toto_pandas(703507f) - polars: implement array flatten support (19b2aa0)
- pyspark: enable multiple values in
.substitute(291a290) - pyspark: implement array flatten support (5d1fadf)
- snowflake: implement array flatten support (d3c754f)
- snowflake: read_csv with https (72752eb)
- snowflake: support udf arguments for reading from staged files (529a3a2)
- snowflake: use upstream
array_sort(9624341) - sqlalchemy: support expressions in window bounds (5dbb3b1)
- trino: implement array flatten support (0d1faaa)
Bug Fixes
- api: avoid casting to bool for
table.info()nullablecolumn (3b3bd7b) - bigquery: escape the schema (project ID) for BQ builtin UDFs (8096552)
- bigquery: fully qualified memtable names in compile (a81e432)
- clickhouse: use backwards compatible methods of getting query metadata (975556f)
- datafusion: bring back UDF registration (43084fa)
- datafusion: ensure that non-matching re_search calls return bool values when patterns do not match (088b027)
- datafusion: support computed group by when the aggregation is count distinct (18bdb7e)
- decompile: handle isin (6857751)
- deferred: don't pass expression in fstringified error message (724859d)
- deps: update dependency datafusion to v33 (57047a2)
- deps: update dependency sqlglot to v20 (13bc6e2)
- duckdb: ensure that already quoted identifiers are not erased (45ee391)
- duckdb: ensure that parameter names are unlikely to overlap with column names (d93dbe2)
- duckdb: gate geoalchemy import in duckdb geospatial (8f012c4)
- duckdb: render dates, times, timestamps and none literals correctly (5d8866a)
- duckdb: use functions for temporal literals (b1407f8)
- duckdb: use the UDF's signature instead of arguments' output type for generating a duckdb signature (233dce1)
- flink: add more test (33e1a31)
- flink: add os to the cache key (1b92b33)
- flink: add test cases for recreate table (1413de9)
- flink: customize the list of base idenitifers (0b5d343)
- flink: fix recreating table/view issue on flink backend (0c9791f)
- flink: implement TypeMapper and SchemaMapper for Flink backend (f983bfa)
- flink: use lazy import to prevent premature loading of pyflink during gen_matrix (d042402)
- geospatial: pretty print data in interactive mode (afb04ed)
- ir: ensure that join projection columns are all always nullable (f5f35c6)
- ir: handle renaming for scalar operations (6f77f17)
- ir: handle the case of non-overlapping data and add a test (1c9ae1b)
- ir: implicitly convert
Noneliterals withdt.Nulltype to the requested type during value coercion (d51ec4e) - ir: merge window frames for bound analytic window functions with a subsequent over call (e12ce8d)
- ir: raise if
Concrete.copy()receives unexpected arguments (442199a) - memtable: ensure column names match provided data (faf99df)
- memtables: disallow duplicat...
7.1.0
7.1.0 (2023-11-16)
Features
- api: add
bucketmethod for timestamps (ca0f7bc) - api: add
Table.samplemethod for sampling rows from a table (3ce2617) - api: allow selectors in
order_by(359fd5e) - api: move analytic window functions to top-level (8f2ced1)
- api: support deferred in reduction filters (349f475)
- api: support specifying
signaturein udf definitions (764977e) - bigquery: add
locationparameter (d652dbb) - bigquery: add
read_csv,read_json,read_parquetsupport (ff83110) - bigquery: support temporary tables using sessions (eab48a9)
- clickhouse: add support for timestamp
bucket(10a5916) - clickhouse: support
Table.fillna(5633660) - common: better inheritance support for Slotted and FrozenSlotted (9165d41)
- common: make Slotted and FrozenSlotted pickleable (13cbce0)
- common: support
Selfannotations forAnnotable(0c60146) - common: use patterns to filter out nodes during graph traversal (3edd8f7)
- dask: add read_csv and read_parquet (e9260af)
- dask: enable pyarrow conversion (2d36722)
- dask: support
Table.sample(09a7626) - datafusion: add case and if-else statements (851d560)
- datafusion: add corr and covar (edc42be)
- datafusion: add isnull and isnan operations (0076c25)
- datafusion: add some array functions (0b96b68)
- datafusion: add StringLength, FindInSet, ArrayStringJoin (fd03831)
- datafusion: add TimestampFromUNIX and subtract/add operations (2bffa5a)
- datafusion: add TimestampTruncate / fix broken extract time part functions (940ed21)
- datafusion: support dropping schemas (cc6870c)
- duckdb: add
attachanddetachmethods for adding and removing databases to the current duckdb session (162b058) - duckdb: add
ntilesupport (bf08a2a) - duckdb: add dict-like for DuckDB settings (ea2d317)
- duckdb: add support for specific timestamp scales (3518b78)
- duckdb: allow users to register fsspec filesystem with DuckDB (6172f07)
- duckdb: expose option to force reinstall extension (98080d0)
- duckdb: implement
Table.sampleas aTABLESAMPLEquery (3a80f3a) - duckdb: implement partial json collection casting (aae28e9)
- flink: add remaining operators for Flink to pass/skip the common tests (b27adc6)
- flink: add several temporal operators (f758228)
- flink: implement the
ops.TryCastoperation (752e587) - formats: map ibis JSON type to pyarrow strings (79b6eac)
- impala/pyspark: implement
to_pyarrow(6b33454) - impala: implement
Table.sample(8e78dfc) - implement window table valued functions (a35a756)
- improve generated column names for methods receiving intervals (c319ed3)
- mssql: add support for timestamp
bucket(1ffac11) - mssql: support cross-db/cross-schema table list (3e0f0fa)
- mysql: support
ntile(9a14ba3) - oracle: add fixes after running pre-commit (6538b70)
- oracle: add fixes after running pre-commit (e3d14b3)
- oracle: add support for loading Oracle RAW and BLOB types (c77eeb2)
- oracle: change parsing of Oracle NUMBER data type (649ab86)
- oracle: remove redundant brackets (2905484)
- pandas: add read_csv and read_parquet (34eeca6)
- pandas: support
Table.sample(77215be) - polars: add support for timestamp
bucket(c59518c) - postgres: add support for timestamp
bucket(4d34afc) - pyspark: support
Table.sample(6aa897e) - snowflake: support
ntile(39eed1a) - snowflake: support cross-db/cross-schema table list (2071897)
- snowflake: support timestamp bucketing (a95ffa9)
- sql: implement
Table.sampleas arandom()filter across several SQL backends (e1870ea) - trino: implement
Table.sampleas aTABLESAMPLEquery (f3d044c) - trino: support
ntile(2978d1a) - trino: support temporal operations (8b8e885)
- udf: improve mypy compatibility for udf functions (65b5bb7)
- use
to_pyarrowinstead ofto_pandasin the interactive repr (72aa573) - ux: fix long links, add repr links in vscode (734bd91)
- ux: implement recursive element conversion for nested types and json ([8ddfa94](https://gi...