-
Notifications
You must be signed in to change notification settings - Fork 81
Added troubleshooting guide for Data Schemas and Extension Properties #1870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| # Data Schemas and Extension Properties Troubleshooting | ||
|
|
||
| Sometimes you can get an exception with a message containing | ||
|
|
||
| ```plain text | ||
| ..exception in generated DataFrame extension property.. | ||
| ``` | ||
|
|
||
| This means there is a runtime error while accessing a [`DataFrame` extension property](extensionPropertiesApi.md), | ||
| generated by the [Compiler Plugin](Compiler-Plugin.md) or in the [Kotlin Notebook](SetupKotlinNotebook.md). | ||
|
|
||
| Such errors are caused by generating extension properties for data schemas that are not compatible with the | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe "mismatch" is a better fit here |
||
| [`DataFrame`](DataFrame.md), [`DataRow`](DataRow.md), etc. | ||
| In most cases, the schema contains columns of the wrong names or types. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. columns with an incorrect name or type
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example: |
||
|
|
||
| ```kotlin | ||
| @DataSchema | ||
| interface Schema { | ||
| val age: String | ||
| } | ||
| ``` | ||
|
|
||
| ```kotlin | ||
| val df = dataFrameOf("age" to columnOf(17, 32, 26)).cast<Schema>() | ||
|
|
||
| // Compiles correctly but fails on runtime | ||
| df.filter { age > 20 } | ||
| ``` | ||
|
|
||
| ## Possible reasons | ||
|
|
||
| ### Incompatible manually defined data schema | ||
|
|
||
| If you define initad data schema manually, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the initial |
||
| make sure your data schema is compatible with the [`DataFrame`](DataFrame.md). | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the concept "the dataframe" is better here
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and you're describing just a solution here, not the reason it fails. I'd make a clear distinction for first the reason it fails and then second how to solve it |
||
|
|
||
| * Use [special methods](DataSchemaGenerationMethods.md) for generating a data schema code | ||
| instead of defining data schema manually. | ||
| * Use [`.cast<Schema>()`](cast.md) with `verify=true` for verifying the `Schema` compatibility. | ||
|
|
||
| ### Incorrect column types after `DataFrame` creation | ||
|
|
||
| Sometimes the runtime schema is wrong itself, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *schema itself is wrong |
||
| because column types differ from the actual column value types. | ||
| This can happen when reading a DataFrame from files or databases. | ||
|
|
||
| > Such cases are most probably bugs! Please report them on [GitHub Issues](https://github.com/Kotlin/dataframe/issues). | ||
| {style="warning"} | ||
|
|
||
| Possible workarounds: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here it's better :) first you describe the problem/cause and then the workaround |
||
|
|
||
| * Specify the correct type using [`.replace {}`](replace.md) and `ValueColumn.changeType()`: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. may be important to mention the two different ways a column can have a "type". There's the internal KType, used by runtime functions, and there's the compile-time type, visible in the IDE, which is also sometimes used in runtime when you refer to a column in an inline reified function, but usually it's visual only
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh and this workaround only works if you're working with a value column, obviously ;P |
||
|
|
||
| ```kotlin | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. korro? |
||
| df.replace { wrongTypeCol }.with { it.asValueColumn().changeType(typeOf<ActualType>) } | ||
| ``` | ||
|
|
||
| * Use [`.inferType { columns }`](inferType.md) to infer the correct types | ||
| for the selected columns from the actual values. | ||
| **It can take a long time and use up a lot of resources for large dataframes!** | ||
|
|
||
| #### Problems with type affinity in SQLite | ||
|
|
||
| Because of [SQLite type affinity](https://sqlite.org/datatype3.html), | ||
| the column typed defined by JDBC may differ from the actual values in the column. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you link to JDBC docs? |
||
| This problem often occurs when reading data from an SQLite database with column of custom types. | ||
|
|
||
| You can provide types for such columns manually: | ||
|
|
||
| ```Kotlin | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. korro :) |
||
| import org.jetbrains.kotlinx.dataframe.io.db.Sqlite | ||
| import kotlin.reflect.typeOf | ||
|
|
||
| val sqliteCustom = Sqlite.withCustomTypes( | ||
| mapOf( | ||
| "LONGVARCHAR" to typeOf<String>(), | ||
| "LONGINT" to typeOf<Long>() | ||
| ) | ||
| ) | ||
| val df = DataFrame.readSqlTable( | ||
| connection, "table_name", dbType = sqliteCustom | ||
| ) | ||
| ``` | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*in Kotlin Notebook (the name of the product), or *in a (Kotlin) notebook (the concept)