Describe the bug
I stumbled a potential issue in DataFusion while testing recursive CTEs in my project:
Here's such a query:
WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
select * from chain
TLDR: the issue is:
- Anchor member
SELECT 'A' AS level has field name level
- the recursive part
SELECT level || '>' ... has field name chain.level || Utf8(\">\")
After a bit of digging and debugging I was able to reproduce the error in the CLI.
I also tested in Postgres for reference (see below).
Exhibit error in CLI with datafusion-cli
The issue is also apparent in the CLI. Using Datafusion 53.1.0.
At first it seems ok in the CLI:
% datafusion-cli
DataFusion CLI v53.1.0
> WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain
;
+-------+
| level |
+-------+
| A |
| A> |
| A>> |
| A>>> |
| A>>>> |
+-------+
5 row(s) fetched.
Elapsed 0.005 seconds.
This works fine, maybe the pretty printer is lenient and is silent or ignores the different schemas.
However if you try to create the table using CREATE TABLE ... AS ..., the error is apparent:
> create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
Error during planning: Mismatch between schema and batches
It fails.
Debug in code:
Adding some debugging shows the value of the returned record batches (without as level for the recursive case:
[
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level || Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level || Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level || Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "chain.level || Utf8(\">\")", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>>", ]], row_count: 1 }
]
As you can see the field names in the schema different in the anchor member and the recursive case.
With the ... as level in the recursive case, the record batches are:
[RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>", ]], row_count: 1 },
RecordBatch { schema: Schema { fields: [Field { name: "level", data_type: Utf8 }], metadata: {} }, columns: [StringArray [ "A>>>>",]], row_count: 1 }]
Compatibility check with Postgres
I tested with Postgres to check compatibility, it works fine (with or without the CREATE TABLE):
% psql
psql (14.22 (Homebrew))
Type "help" for help.
jc=# WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
select * from chain;
level
-------
A
A>
A>>
A>>>
A>>>>
(5 rows)
jc=# create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
SELECT 5
jc=# select * from test;
level
-------
A
A>
A>>
A>>>
A>>>>
(5 rows)
To Reproduce
Run this in Datafusion CLI:
create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
It should fail with Error during planning: Mismatch between schema and batches
Expected behavior
It should behave like:
> create table test as ( WITH RECURSIVE chain AS (
SELECT 'A' AS level
UNION ALL
SELECT level || '>' as level FROM chain WHERE length(level) < 5
)
SELECT * FROM chain );
0 row(s) fetched.
Elapsed 0.003 seconds.
> select * from test;
+-------+
| level |
+-------+
| A |
| A> |
| A>> |
| A>>> |
| A>>>> |
+-------+
5 row(s) fetched.
Elapsed 0.004 seconds.
note the ... as level ... in SELECT level || '>' as level FROM chain ...
Additional context
No response
Describe the bug
I stumbled a potential issue in DataFusion while testing recursive CTEs in my project:
Here's such a query:
TLDR: the issue is:
SELECT 'A' AS levelhas field namelevelSELECT level || '>' ...has field namechain.level || Utf8(\">\")After a bit of digging and debugging I was able to reproduce the error in the CLI.
I also tested in Postgres for reference (see below).
Exhibit error in CLI with datafusion-cli
The issue is also apparent in the CLI. Using Datafusion 53.1.0.
At first it seems ok in the CLI:
This works fine, maybe the pretty printer is lenient and is silent or ignores the different schemas.
However if you try to create the table using
CREATE TABLE ... AS ..., the error is apparent:It fails.
Debug in code:
Adding some debugging shows the value of the returned record batches (without
as levelfor the recursive case:As you can see the field names in the schema different in the anchor member and the recursive case.
With the
... as levelin the recursive case, the record batches are:Compatibility check with Postgres
I tested with Postgres to check compatibility, it works fine (with or without the
CREATE TABLE):To Reproduce
Run this in Datafusion CLI:
It should fail with
Error during planning: Mismatch between schema and batchesExpected behavior
It should behave like:
note the
... as level ...inSELECT level || '>' as level FROM chain ...Additional context
No response