-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
Description
What happens?
The DuckDB Spark API is incompatible with the PySpark API's spark.createDataFrame(list of dict) method.
To Reproduce
from duckdb.experimental.spark.sql import SparkSession as DuckdbSparkSession
from pyspark.sql import SparkSession
sql_text = "SELECT * FROM t0"
data = [
{"c0": "1969-12-21"}
]
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data)
df.createOrReplaceTempView("t0")
print("PySpark SQL result:")
pyspark_result = spark.sql(sql_text)
pyspark_result.show()
duckdb_spark = DuckdbSparkSession.builder.getOrCreate()
df = duckdb_spark.createDataFrame(data)
df.createOrReplaceTempView("t0")
print("Duckdb Spark SQL result: ")
duckdb_spark_result = duckdb_spark.sql(sql_text)
duckdb_spark_result.show()PySpark SQL result:
+----------+
| c0|
+----------+
|1969-12-21|
+----------+
Duckdb Spark SQL result:
┌─────────┐
│ col0 │
│ varchar │
├─────────┤
│ c0 │
└─────────┘OS:
x86_64 Ubuntu 24.04 Linux-6.14.0-35-generic-x86_64-with-glibc2.39
DuckDB Version:
1.4.2
DuckDB Client:
Python
Hardware:
No response
Full Name:
asddfl
Affiliation:
xxx
Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?
- Yes, I have
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant data sets for reproducing the issue?
Yes