Skip to content

Conversation

@ShimonSte
Copy link
Contributor

This PR adds comprehensive documentation for the TableProvider API (format-based access) and Variant DataType support in the ClickHouse Spark connector.

Key Additions

  • TableProvider API: Complete guide with examples in Python, Scala, and Java
  • Databricks Integration: Installation instructions, notebook examples, and Unity Catalog integration
  • VariantType Support: Guide for Spark 4.0+ VariantType and ClickHouse JSON/Variant types

@ShimonSte ShimonSte requested review from a team as code owners December 31, 2025 09:53
@vercel
Copy link

vercel bot commented Dec 31, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
clickhouse-docs Ready Ready Preview Jan 7, 2026 4:22pm
3 Skipped Deployments
Project Deployment Review Updated (UTC)
clickhouse-docs-jp Ignored Ignored Jan 7, 2026 4:22pm
clickhouse-docs-ru Ignored Ignored Preview Jan 7, 2026 4:22pm
clickhouse-docs-zh Ignored Ignored Preview Jan 7, 2026 4:22pm

Copy link
Contributor

@BentsiLeviav BentsiLeviav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take care of the following as well:

  • Explain the differences between the Catalog and TableProvider API at the head of the document, and tell what we recommend.
  • Move out the Databricks documentation to a dedicated page (same as we have today with Glue)
    • Add screenshots from the Databricks platform
    • Make sure we add a link to the new page in https://clickhouse.com/docs/integrations
    • Document the option of using the Catalog-based API when the Unity catalog is disabled
    • Explain more on the differences between the APIs, and when we recommend using one over the other (if you don't use the Unity catalog at all, we recommend using the catalog-based, if you use it, TableProvider)

…egration

- Add detailed TableProvider API (format-based) documentation
- Include comparison table between Catalog API and TableProvider API
- Add comprehensive Databricks integration examples
- Document automatic table creation with ORDER BY requirement
- Add Unity Catalog integration examples
- Include troubleshooting section for Databricks-specific issues
- Document Spark 4.0 Jackson dependency shading
- Add examples for schema inference, filtering, and write modes
…section

- Fixed missing closing </TabItem> and </Tabs> tags in Databricks Notebook Usage section
- Resolves MDX compilation error that prevented the documentation server from running
- Add anchor IDs to all level 4 headings that were missing them
- Fixes CI markdown linting errors for custom-anchor-headings rule
- Add Catalog vs TableProvider API comparison section with recommendations
- Create dedicated Databricks integration page
- Remove Databricks section from main connector page
- Update sidebar to include Databricks page
- Add ClickHouseSupportedBadge to spark-native-connector page
- Ensure all code examples include Python, Scala, and Java versions
- Address PR review comments for TableProvider API documentation
- Remove extra blank lines in databricks.md and spark-native-connector.md
- Add explicit heading IDs for 'Using TableProvider API' and 'Using Catalog API' sections
@ShimonSte
Copy link
Contributor Author

@BentsiLeviav, added new page fore databricks.
Did not add the screenshots for now as i think they are a bit "clanky" and the instructions are very clear.
LMK if you still think we should add them

@ShimonSte ShimonSte requested a review from BentsiLeviav January 4, 2026 14:32
@BentsiLeviav
Copy link
Contributor

@ShimonSte
Yes, let's add them. Screenshots are still useful here for making the user's life easier and particularly for consistency with our documentation for other partner frameworks (Glue, Dataflow, Power BI, Tableau).

Copy link
Contributor

@BentsiLeviav BentsiLeviav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still things that were not addressed, would love you to take care of them:

  • Would you mind adding a link to the new Databricks page in https://clickhouse.com/docs/integrations
  • Document the option of using the Catalog-based API when the Unity catalog is disabled
  • You added a comparison between the APIs, but can you add our recommendation? Is one of the APIs more developed? Does Spark recommend one over the other? Is one of them less maintained by the Spark team?

In addition, I added a few small comments to the code itself

- Add dedicated Databricks integration page with installation instructions
- Add Databricks entry to integrations grid (fallback JSON)
- Add Databricks logo to static assets
- Add Databricks link to data ingestion index page
- Update IntegrationGrid component to support local static logo files
- Add installation screenshots for Databricks UI, Maven, and workspace volume
- Add note about JSON column hints performance improvement in spark-native-connector
- Remove Databricks-specific content from spark-native-connector page
@ShimonSte
Copy link
Contributor Author

@BentsiLeviav, I've made the requested changes.
Regarding recommendations: I don't have any specific ones at this time. Both approaches use the same underlying concept and mostly share code.
There are certain operations that the TableProvider API cannot perform, which we've documented. Besides that, the functionality that is supported behaves identically whether using the TableProvider API or Catalog implementation.

…mitation

- Add separate top-level section explaining ClickHouse options configuration for both APIs
- Update Writing Modes section to clarify it applies to both TableProvider and Catalog APIs
- Add note about partition overwrite not being supported with link to GitHub issue #34
- Add note about JSON column hints performance improvement with link to GitHub issue #497
…tion

- Add note about partition overwrite not being supported in Catalog API
- Links to GitHub issue #34 for tracking
Copy link
Member

@Blargian Blargian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some suggestions for using sentence casing in titles vs title casing

@Blargian Blargian self-assigned this Jan 7, 2026
ShimonSte and others added 2 commits January 7, 2026 17:01
Co-authored-by: Shaun Struwig <41984034+Blargian@users.noreply.github.com>
- Clarify that JAR must be uploaded to workspace first before installing
- Update installation steps for better clarity
@ShimonSte ShimonSte merged commit 2c5f2dd into main Jan 8, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants