Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions docs/integrations/language-clients/python/additional-options.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_label: 'Additional options'
sidebar_position: 3
sidebar_position: 12
keywords: ['clickhouse', 'python', 'options', 'settings']
description: 'Additional options for ClickHouse Connect'
slug: /integrations/language-clients/python/additional-options
Expand All @@ -21,39 +21,37 @@ from clickhouse_connect import common

common.set_setting('autogenerate_session_id', False)
common.get_setting('invalid_setting_action')
'drop'
'error'
```

:::note
These common settings `autogenerate_session_id`, `product_name`, and `readonly` should _always_ be modified before creating a client with the `clickhouse_connect.get_client` method. Changing these settings after client creation doesn't affect the behavior of existing clients.
These common settings `autogenerate_session_id`, `autogenerate_query_id`, `product_name`, and `readonly` should _always_ be modified before creating a client with the `clickhouse_connect.get_client` method. Changing these settings after client creation doesn't affect the behavior of existing clients.
:::

The following global settings are currently defined:

| Setting Name | Default | Options | Description |
|-------------------------------------|---------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| autogenerate_session_id | True | True, False | Autogenerate a new UUID(1) session ID (if not provided) for each client session. If no session ID is provided (either at the client or query level), ClickHouse will generate a random internal ID for each query. |
| dict_parameter_format | 'json' | 'json', 'map' | This controls whether parameterized queries convert a Python dictionary to JSON or ClickHouse Map syntax. `json` should be used for inserts into JSON columns, `map` for ClickHouse Map columns. |
| invalid_setting_action | 'error' | 'drop', 'send', 'error' | Action to take when an invalid or readonly setting is provided (either for the client session or query). If `drop`, the setting will be ignored, if `send`, the setting will be sent to ClickHouse, if `error` a client side ProgrammingError will be raised. |
| max_connection_age | 600 | | Maximum seconds that an HTTP Keep Alive connection will be kept open/reused. This prevents bunching of connections against a single ClickHouse node behind a load balancer/proxy. Defaults to 10 minutes. |
| product_name | | | A string that is passed with the query to ClickHouse for tracking the app using ClickHouse Connect. Should be in the form <product name;&gl/<product version>. |
| readonly | 0 | 0, 1 | Implied "read_only" ClickHouse settings for versions prior to 19.17. Can be set to match the ClickHouse "read_only" value for settings to allow operation with very old ClickHouse versions. |
| send_os_user | True | True, False | Include the detected operating system user in client information sent to ClickHouse (HTTP User-Agent string). |
| send_integration_tags | True | True, False | Include the used integration libraries/version (e.g. Pandas/SQLAlchemy/etc.) in client information sent to ClickHouse (HTTP User-Agent string). |
| use_protocol_version | True | True, False | Use the client protocol version. This is needed for `DateTime` timezone columns but breaks with the current version of chproxy. |
| max_error_size | 1024 | | Maximum number of characters that will be returned in a client error messages. Use 0 for this setting to get the full ClickHouse error message. Defaults to 1024 characters. |
| http_buffer_size | 10MB | | Size (in bytes) of the "in-memory" buffer used for HTTP streaming queries. |
| preserve_pandas_datetime_resolution | False | True, False | When True and using pandas 2.x, preserves the datetime64/timedelta64 dtype resolution (e.g., 's', 'ms', 'us', 'ns'). If False (or on pandas <2.x), coerces to nanosecond ('ns') resolution for compatibility. |
| Setting Name | Default | Options | Description |
|-------------------------|---------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| autogenerate_session_id | True | True, False | Autogenerate a new UUID(4) session ID (if not provided) for each client session. If no session ID is provided (either at the client or query level), ClickHouse will generate a random internal ID for each query. |
| autogenerate_query_id | True | True, False | Autogenerate a new UUID(4) query ID for each query if one isn't provided. Useful for tracking queries in `system.query_log`. |
| dict_parameter_format | 'json' | 'json', 'map' | This controls whether parameterized queries convert a Python dictionary to JSON or ClickHouse Map syntax. `json` should be used for inserts into JSON columns, `map` for ClickHouse Map columns. |
| invalid_setting_action | 'error' | 'drop', 'send', 'error' | Action to take when an invalid or readonly setting is provided (either for the client session or query). If `drop`, the setting will be ignored, if `send`, the setting will be sent to ClickHouse, if `error` a client side ProgrammingError will be raised. |
| max_connection_age | 600 | | Maximum seconds that an HTTP Keep Alive connection will be kept open/reused. This prevents bunching of connections against a single ClickHouse node behind a load balancer/proxy. Defaults to 10 minutes. |
| product_name | | | A string that is passed with the query to ClickHouse for tracking the app using ClickHouse Connect. Should be in the form `<product name>/<product version>`. |
| readonly | 0 | 0, 1 | Implied "read_only" ClickHouse settings for versions prior to 19.17. Can be set to match the ClickHouse "read_only" value for settings to allow operation with very old ClickHouse versions. |
| send_os_user | True | True, False | Include the detected operating system user in client information sent to ClickHouse (HTTP User-Agent string). |
| send_integration_tags | True | True, False | Include the used integration libraries/version (e.g. Pandas/SQLAlchemy/etc.) in client information sent to ClickHouse (HTTP User-Agent string). |
| use_protocol_version | True | True, False | Use the client protocol version. This is needed for `DateTime` timezone columns but breaks with the current version of chproxy. |
| max_error_size | 1024 | | Maximum number of characters that will be returned in a client error messages. Use 0 for this setting to get the full ClickHouse error message. Defaults to 1024 characters. |
| http_buffer_size | 10MB | | Size (in bytes) of the "in-memory" buffer used for HTTP streaming queries. |

## Compression {#compression}

ClickHouse Connect supports lz4, zstd, brotli, and gzip compression for both query results and inserts. Always keep in mind that using compression usually involves a tradeoff between network bandwidth/transfer speed against CPU usage (both on the client and the server.)

To receive compressed data, the ClickHouse server `enable_http_compression` must be set to 1, or the user must have permission to change the setting on a "per query" basis.

Compression is controlled by the `compress` parameter when calling the `clickhouse_connect.get_client` factory method. By default, `compress` is set to `True`, which will trigger the default compression settings. For queries executed with the `query`, `query_np`, and `query_df` client methods, ClickHouse Connect will add the `Accept-Encoding` header with
the `lz4`, `zstd`, `br` (brotli, if the brotli library is installed), `gzip`, and `deflate` encodings to queries executed with the `query` client method (and indirectly, `query_np` and `query_df`). (For the majority of requests the ClickHouse
server will return with a `zstd` compressed payload.) For inserts, by default ClickHouse Connect will compress insert blocks with `lz4` compression, and send the `Content-Encoding: lz4` HTTP header.
Compression is controlled by the `compress` parameter when calling the `clickhouse_connect.get_client` factory method. By default, `compress` is set to `True`, which will trigger the default compression settings. For queries executed with the `query`, `query_np`, and `query_df` client methods, ClickHouse Connect will add the `Accept-Encoding` header with the `lz4`, `zstd`, `gzip`, and `deflate` encodings (plus `br` if the optional brotli library is installed). For the majority of requests the ClickHouse server will return with a `zstd` compressed payload. For inserts, by default ClickHouse Connect will compress insert blocks with `lz4` compression, and send the `Content-Encoding: lz4` HTTP header.

The `get_client` `compress` parameter can also be set to a specific compression method, one of `lz4`, `zstd`, `br`, or `gzip`. That method will then be used for both inserts and query results (if supported by the ClickHouse server.) The required `zstd` and `lz4` compression libraries are now installed by default with ClickHouse Connect. If `br`/brotli is specified, the brotli library must be installed separately.

Expand All @@ -69,11 +67,11 @@ To use a SOCKS proxy, you can send a `urllib3` `SOCKSProxyManager` as the `pool_

## "Old" JSON data type {#old-json-data-type}

The experimental `Object` (or `Object('json')`) data type is deprecated and should be avoided in a production environment. ClickHouse Connect continues to provide limited support for the data type for backward compatibility. Note that this support doesn't include queries that are expected to return "top level" or "parent" JSON values as dictionaries or the equivalent, and such queries will result in an exception.
The experimental `Object` (or `Object('json')`) data type was removed from ClickHouse Connect in version 0.14.0. It has been superseded by the new `JSON` type described below. Users on older versions of ClickHouse Connect that still supported this type should upgrade to the new JSON type.

## "New" Variant/Dynamic/JSON datatypes (experimental feature) {#new-variantdynamicjson-datatypes-experimental-feature}
## "New" Variant/Dynamic/JSON/QBit datatypes (experimental feature) {#new-variantdynamicjson-datatypes-experimental-feature}

Beginning with the 0.8.0 release, `clickhouse-connect` provides experimental support for the new (also experimental) ClickHouse types Variant, Dynamic, and JSON.
`clickhouse-connect` provides support for the ClickHouse types Variant, Dynamic, JSON, and QBit.

### Usage notes {#usage-notes}
- JSON data can be inserted as either a Python dictionary or a JSON string containing a JSON object `{}`. Other forms of JSON data aren't supported.
Expand All @@ -85,5 +83,7 @@ Beginning with the 0.8.0 release, `clickhouse-connect` provides experimental sup
- The "new" JSON type is available starting with the ClickHouse 24.8 release
- Due to internal format changes, `clickhouse-connect` is only compatible with Variant types beginning with the ClickHouse 24.7 release
- Returned JSON objects will only return the `max_dynamic_paths` number of elements (which defaults to 1024). This will be fixed in a future release.
- Variant types now support native writes (since 0.13.0). Values are serialized using their native ClickHouse types rather than being stringified. For ambiguous cases, use the `typed_variant()` helper (see [Write Format Options](advanced-inserting.md#write-format-options)).
- Inserts into `Dynamic` columns will always be the String representation of the Python value. This will be fixed in a future release, once https://github.com/ClickHouse/ClickHouse/issues/70395 has been fixed.
- The implementation for the new types hasn't been optimized in C code, so performance may be somewhat slower than for simpler, established data types.
- The `QBit` type is a bit-transposed vector type for efficient vector search. It requires `allow_experimental_qbit_type = 1` and ClickHouse server version 25.10+. QBit columns map to Python `list[float]` and support BFloat16, Float32, and Float64 element types. NumPy is strongly recommended for QBit performance.
Loading
Loading