-
Notifications
You must be signed in to change notification settings - Fork 423
Add Spark TableProvider API Documentation and Databricks Integration Guide + Variant datatype support #5124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
4646d2d to
73297ea
Compare
BentsiLeviav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take care of the following as well:
- Explain the differences between the Catalog and
TableProviderAPI at the head of the document, and tell what we recommend. - Move out the Databricks documentation to a dedicated page (same as we have today with Glue)
- Add screenshots from the Databricks platform
- Make sure we add a link to the new page in https://clickhouse.com/docs/integrations
- Document the option of using the Catalog-based API when the Unity catalog is disabled
- Explain more on the differences between the APIs, and when we recommend using one over the other (if you don't use the Unity catalog at all, we recommend using the catalog-based, if you use it, TableProvider)
docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
Outdated
Show resolved
Hide resolved
docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
Outdated
Show resolved
Hide resolved
docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
Outdated
Show resolved
Hide resolved
docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
Outdated
Show resolved
Hide resolved
docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
Outdated
Show resolved
Hide resolved
…egration - Add detailed TableProvider API (format-based) documentation - Include comparison table between Catalog API and TableProvider API - Add comprehensive Databricks integration examples - Document automatic table creation with ORDER BY requirement - Add Unity Catalog integration examples - Include troubleshooting section for Databricks-specific issues - Document Spark 4.0 Jackson dependency shading - Add examples for schema inference, filtering, and write modes
…section - Fixed missing closing </TabItem> and </Tabs> tags in Databricks Notebook Usage section - Resolves MDX compilation error that prevented the documentation server from running
- Add anchor IDs to all level 4 headings that were missing them - Fixes CI markdown linting errors for custom-anchor-headings rule
- Add Catalog vs TableProvider API comparison section with recommendations - Create dedicated Databricks integration page - Remove Databricks section from main connector page - Update sidebar to include Databricks page - Add ClickHouseSupportedBadge to spark-native-connector page - Ensure all code examples include Python, Scala, and Java versions - Address PR review comments for TableProvider API documentation
73297ea to
02d4777
Compare
- Remove extra blank lines in databricks.md and spark-native-connector.md - Add explicit heading IDs for 'Using TableProvider API' and 'Using Catalog API' sections
|
@BentsiLeviav, added new page fore databricks. |
|
@ShimonSte |
docs/integrations/data-ingestion/apache-spark/spark-native-connector.md
Outdated
Show resolved
Hide resolved
BentsiLeviav
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still things that were not addressed, would love you to take care of them:
- Would you mind adding a link to the new Databricks page in https://clickhouse.com/docs/integrations
- Document the option of using the Catalog-based API when the Unity catalog is disabled
- You added a comparison between the APIs, but can you add our recommendation? Is one of the APIs more developed? Does Spark recommend one over the other? Is one of them less maintained by the Spark team?
In addition, I added a few small comments to the code itself
- Add dedicated Databricks integration page with installation instructions - Add Databricks entry to integrations grid (fallback JSON) - Add Databricks logo to static assets - Add Databricks link to data ingestion index page - Update IntegrationGrid component to support local static logo files - Add installation screenshots for Databricks UI, Maven, and workspace volume - Add note about JSON column hints performance improvement in spark-native-connector - Remove Databricks-specific content from spark-native-connector page
|
@BentsiLeviav, I've made the requested changes. |
…mitation - Add separate top-level section explaining ClickHouse options configuration for both APIs - Update Writing Modes section to clarify it applies to both TableProvider and Catalog APIs - Add note about partition overwrite not being supported with link to GitHub issue #34 - Add note about JSON column hints performance improvement with link to GitHub issue #497
…tion - Add note about partition overwrite not being supported in Catalog API - Links to GitHub issue #34 for tracking
Blargian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some suggestions for using sentence casing in titles vs title casing
Co-authored-by: Shaun Struwig <41984034+Blargian@users.noreply.github.com>
- Clarify that JAR must be uploaded to workspace first before installing - Update installation steps for better clarity
This PR adds comprehensive documentation for the TableProvider API (format-based access) and Variant DataType support in the ClickHouse Spark connector.
Key Additions