Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ val versionsToSync =
"maven",
"kotlinDatetime",
"log4j",
"spark3",
"kotlin-spark",
"spark4",
"kotlin-dl",
)
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/concepts/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ This is why it was designed to be hierarchical and allows nesting of columns and
* [**Interoperable**](collectionsInterop.md) — convertable with Kotlin data classes and collections.
This also means conversion to/from other libraries' data structures is usually quite straightforward!
See our [examples](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources)
for some conversions between DataFrame and [Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark), [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik), and [JetBrains Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed).
for some conversions between DataFrame and [Apache Spark](https://github.com/Kotlin/dataframe/tree/master/examples/projects/kotlin-spark), [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik), and [JetBrains Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed).
* **Generic** — can store objects of any type, not only numbers or strings.
* **Typesafe** — the Kotlin DataFrame library provides a mechanism of on-the-fly [**generation of extension properties**](extensionPropertiesApi.md)
that correspond to the columns of a dataframe.
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/dataSources/Integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Below is a list of example integrations with other data frameworks.
These examples demonstrate how to bridge Kotlin DataFrame with external libraries or APIs.

- [Kotlin Exposed](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed)
- [Apache Spark (with/without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark)
- [Apache Spark (with/without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/projects/kotlin-spark)
- [Multik](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik)

You can use these examples as templates to create your own integrations
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/guides/Guides-And-Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ and make working with your data both convenient and type-safe.
* [Using Unsupported Data Sources](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples):
— A guide by examples. While these might one day become proper integrations of DataFrame, for now,
we provide them as examples for how to make such integrations yourself.
* [Apache Spark Interop (With and Without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/spark)
* [Apache Spark Interop (With and Without Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/projects/kotlin-spark)
* [Multik Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/multik)
* [JetBrains Exposed Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/exposed)
* [Hibernate ORM](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/hibernate)
Expand Down
2 changes: 1 addition & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ They show how to convert to and from Kotlin DataFrame and their respective table
for an example of using Kotlin DataFrame with [Exposed](https://github.com/JetBrains/Exposed).
* **Hibernate**: See the [hibernate folder](./idea-examples/unsupported-data-sources/hibernate)
for an example of using Kotlin DataFrame with [Hibernate](https://hibernate.org/orm/).
* **Apache Spark**: See the [spark folder](./idea-examples/unsupported-data-sources/spark)
* **Apache Spark**: See the [spark folder](./projects/kotlin-spark)
for an example of using Kotlin DataFrame with [Spark](https://spark.apache.org/) and with the [Kotlin Spark API](https://github.com/JetBrains/kotlin-spark-api).
* **Multik**: See the [multik folder](./idea-examples/unsupported-data-sources/multik)
for an example of using Kotlin DataFrame with [Multik](https://github.com/Kotlin/multik).
Expand Down
41 changes: 41 additions & 0 deletions examples/projects/dev/kotlin-spark/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = space
indent_size = 4
max_line_length = 120

[*.json]
indent_size = 2

[{*.yaml,*.yml}]
indent_size = 2

[*.ipynb]
insert_final_newline = false

[*.{kt,kts}]
ij_kotlin_code_style_defaults = KOTLIN_OFFICIAL

# Disable wildcard imports entirely
ij_kotlin_name_count_to_use_star_import = 2147483647
ij_kotlin_name_count_to_use_star_import_for_members = 2147483647
ij_kotlin_packages_to_use_import_on_demand = unset

ktlint_code_style = ktlint_official
ktlint_experimental = enabled
ktlint_standard_filename = disabled
ktlint_standard_no-empty-first-line-in-class-body = disabled
ktlint_class_signature_rule_force_multiline_when_parameter_count_greater_or_equal_than = 4
ktlint_function_signature_rule_force_multiline_when_parameter_count_greater_or_equal_than = 4
ktlint_standard_chain-method-continuation = disabled
ktlint_ignore_back_ticked_identifier = true
ktlint_standard_multiline-expression-wrapping = disabled
ktlint_standard_when-entry-bracing = disabled
ktlint_standard_expression-operand-wrapping = disabled

[{*/build/**/*,**/*keywords*/**,**/*.Generated.kt,**/*$Extensions.kt,**/BuildConfig.kt}]
ktlint = disabled
18 changes: 18 additions & 0 deletions examples/projects/dev/kotlin-spark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Apache Spark

Showcase of how to use DataFrame with [Apache Spark](https://spark.apache.org/) and
the [Kotlin Spark API](https://github.com/JetBrains/kotlin-spark-api).

Even though Spark is not officially supported as a data source in DataFrame,
this project shows how to convert from and to Spark tables.

This project uses the
[Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html).

We recommend using an up-to-date IntelliJ IDEA for the best experience,
as well as the latest Kotlin plugin version.

> [!WARNING]
> For proper functionality in IntelliJ IDEA requires version 2025.2 or newer.

[Download this Example](https://github.com/Kotlin/dataframe/raw/example-projects-archives/kotlin-spark.zip)
Original file line number Diff line number Diff line change
@@ -1,28 +1,25 @@
import org.jetbrains.kotlin.gradle.dsl.JvmTarget

plugins {
application
kotlin("jvm")

// uses the 'old' Gradle plugin instead of the compiler plugin for now
id("org.jetbrains.kotlinx.dataframe")
alias(libs.plugins.kotlin.jvm)
alias(libs.plugins.kotlin.dataframe)
alias(libs.plugins.ktlint.gradle)

// only mandatory if `kotlin.dataframe.add.ksp=false` in gradle.properties
id("com.google.devtools.ksp")
application
}

repositories {
mavenLocal() // in case of local dataframe development
mavenCentral()
}

dependencies {
// implementation("org.jetbrains.kotlinx:dataframe:X.Y.Z")
implementation(project(":"))
implementation(libs.dataframe)

// (kotlin) spark support
// (Kotlin) Spark SQL (Spark 3.3.2)
implementation(libs.kotlin.spark)
compileOnly(libs.spark)
compileOnly(libs.spark.sql)

// Logging to keep Spark quiet
implementation(libs.log4j.core)
implementation(libs.log4j.api)
}
Expand Down Expand Up @@ -64,6 +61,7 @@ val runSparkUntypedDataset by tasks.registering(JavaExec::class) {
}

kotlin {
jvmToolchain(11)
compilerOptions {
jvmTarget = JvmTarget.JVM_11
freeCompilerArgs.add("-Xjdk-release=11")
Expand Down
5 changes: 5 additions & 0 deletions examples/projects/dev/kotlin-spark/gradle.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
org.gradle.jvmargs=-Xmx1g -Dfile.encoding=UTF-8
kotlin.code.style=official
# Disabling incremental compilation will no longer be necessary
# when https://youtrack.jetbrains.com/issue/KT-66735 is resolved.
kotlin.incremental=false
24 changes: 24 additions & 0 deletions examples/projects/dev/kotlin-spark/gradle/libs.versions.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[versions]
kotlin = "2.3.21"
dataframe = "1.0.0-Beta5"
ktlint-gradle = "14.0.1"
ktlint = "1.8.0"
log4j = "2.25.4"

# check the versions down in the [libraries] section too!
kotlin-spark = "1.2.4"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, forgot to add these versions to dfbuild.buildExampleProjects versionsToSync, doing that now

spark3 = "3.3.2"

[libraries]
dataframe = { module = "org.jetbrains.kotlinx:dataframe", version.ref = "dataframe" }
log4j-core = { group = "org.apache.logging.log4j", name = "log4j-core", version.ref = "log4j" }
log4j-api = { group = "org.apache.logging.log4j", name = "log4j-api", version.ref = "log4j" }
kotlin-spark = { group = "org.jetbrains.kotlinx.spark", name = "kotlin-spark-api_3.3.2_2.13", version.ref = "kotlin-spark" }
spark-sql = { group = "org.apache.spark", name = "spark-sql_2.13", version.ref = "spark3" }

[plugins]
kotlin-jvm = { id = "org.jetbrains.kotlin.jvm", version.ref = "kotlin" }
ktlint-gradle = { id = "org.jlleitschuh.gradle.ktlint", version.ref = "ktlint-gradle" }

# The Kotlin DataFrame Compiler plugin is the same version as the Kotlin plugin.
kotlin-dataframe = { id = "org.jetbrains.kotlin.plugin.dataframe", version.ref = "kotlin" }
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-9.5.0-bin.zip
networkTimeout=10000
validateDistributionUrl=true
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
18 changes: 18 additions & 0 deletions examples/projects/dev/kotlin-spark/settings.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
pluginManagement {
repositories {
maven("https://packages.jetbrains.team/maven/p/kt/dev/")
mavenCentral()
gradlePluginPortal()
}
}
plugins {
id("org.gradle.toolchains.foojay-resolver-convention") version "1.0.0"
}
rootProject.name = "kotlin-spark"

// region generated-config

// substitutes dependencies provided by the root project
includeBuild("../../../..")

// endregion
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.std
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.api.toList
import org.jetbrains.kotlinx.dataframe.api.toListOf
import org.jetbrains.kotlinx.spark.api.withSpark

/**
Expand Down Expand Up @@ -60,14 +60,17 @@ fun main() = withSpark {
ageStats.print(columnTypes = true, borders = true)

// and when we want to convert a DataFrame back to Spark, we can do the same trick via a typed List
val sparkDatasetAgain = dataframe.toList().toDS()
// Using the compiler plugin, it's important to specify the target data class explicitly!
// The local compiler-plugin type is not a data class that can be instantiated.
val sparkDatasetAgain = dataframe.toListOf<Person>().toDS()
sparkDatasetAgain.printSchema()
sparkDatasetAgain.show()
}

@DataSchema
data class Name(val firstName: String, val lastName: String)

// The @DataSchema annotation is optional for this specific example, but is generally recommended
@DataSchema
data class Person(
val name: Name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ import org.jetbrains.kotlinx.dataframe.api.print
import org.jetbrains.kotlinx.dataframe.api.schema
import org.jetbrains.kotlinx.dataframe.api.std
import org.jetbrains.kotlinx.dataframe.api.toDataFrame
import org.jetbrains.kotlinx.dataframe.api.toList
import org.jetbrains.kotlinx.dataframe.api.toListOf
import java.io.Serializable

/**
Expand Down Expand Up @@ -78,7 +78,9 @@ fun main() {
ageStats.print(columnTypes = true, borders = true)

// and when we want to convert a DataFrame back to Spark, we can do the same trick via a typed List
val sparkDatasetAgain = spark.createDataset(dataframe.toList(), beanEncoderOf())
// Using the compiler plugin, it's important to specify the target data class explicitly!
// The local compiler-plugin type is not a data class that can be instantiated.
val sparkDatasetAgain = spark.createDataset(dataframe.toListOf<Person>(), beanEncoderOf())
sparkDatasetAgain.printSchema()
sparkDatasetAgain.show()

Expand All @@ -93,6 +95,7 @@ data class Name
@JvmOverloads
constructor(var firstName: String = "", var lastName: String = "") : Serializable

// The @DataSchema annotation is optional for this specific example, but is generally recommended
@DataSchema
data class Person
@JvmOverloads
Expand Down
41 changes: 41 additions & 0 deletions examples/projects/kotlin-spark/.editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
indent_style = space
indent_size = 4
max_line_length = 120

[*.json]
indent_size = 2

[{*.yaml,*.yml}]
indent_size = 2

[*.ipynb]
insert_final_newline = false

[*.{kt,kts}]
ij_kotlin_code_style_defaults = KOTLIN_OFFICIAL

# Disable wildcard imports entirely
ij_kotlin_name_count_to_use_star_import = 2147483647
ij_kotlin_name_count_to_use_star_import_for_members = 2147483647
ij_kotlin_packages_to_use_import_on_demand = unset

ktlint_code_style = ktlint_official
ktlint_experimental = enabled
ktlint_standard_filename = disabled
ktlint_standard_no-empty-first-line-in-class-body = disabled
ktlint_class_signature_rule_force_multiline_when_parameter_count_greater_or_equal_than = 4
ktlint_function_signature_rule_force_multiline_when_parameter_count_greater_or_equal_than = 4
ktlint_standard_chain-method-continuation = disabled
ktlint_ignore_back_ticked_identifier = true
ktlint_standard_multiline-expression-wrapping = disabled
ktlint_standard_when-entry-bracing = disabled
ktlint_standard_expression-operand-wrapping = disabled

[{*/build/**/*,**/*keywords*/**,**/*.Generated.kt,**/*$Extensions.kt,**/BuildConfig.kt}]
ktlint = disabled
18 changes: 18 additions & 0 deletions examples/projects/kotlin-spark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Apache Spark

Showcase of how to use DataFrame with [Apache Spark](https://spark.apache.org/) and
the [Kotlin Spark API](https://github.com/JetBrains/kotlin-spark-api).

Even though Spark is not officially supported as a data source in DataFrame,
this project shows how to convert from and to Spark tables.

This project uses the
[Kotlin DataFrame Compiler Plugin](https://kotlin.github.io/dataframe/compiler-plugin.html).

We recommend using an up-to-date IntelliJ IDEA for the best experience,
as well as the latest Kotlin plugin version.

> [!WARNING]
> For proper functionality in IntelliJ IDEA requires version 2025.2 or newer.

[Download this Example](https://github.com/Kotlin/dataframe/raw/example-projects-archives/kotlin-spark.zip)
Loading
Loading