Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cloud/reference/01_changelog/01_changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -1524,7 +1524,7 @@ This release enables dictionaries from local ClickHouse table and HTTP sources,

### General changes {#general-changes-5}

- Added support for [dictionaries](/sql-reference/dictionaries/index.md) from local ClickHouse table and HTTP sources
- Added support for [dictionaries](/sql-reference/statements/create/dictionary) from local ClickHouse table and HTTP sources
- Introduced support for the Mumbai [region](/cloud/reference/supported-regions)

### Console changes {#console-changes-30}
Expand Down
16 changes: 8 additions & 8 deletions docs/dictionary/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
title: 'Dictionary'
keywords: ['dictionary', 'dictionaries']
description: 'A dictionary provides a key-value representation of data for fast lookups.'
doc_type: 'reference'
doc_type: 'guide'
---

import dictionaryUseCases from '@site/static/images/dictionary/dictionary-use-cases.png';
Expand All @@ -12,7 +12,7 @@

# Dictionary

A dictionary in ClickHouse provides an in-memory [key-value](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) representation of data from various [internal and external sources](/sql-reference/dictionaries#dictionary-sources), optimizing for super-low latency lookup queries.
A dictionary in ClickHouse provides an in-memory [key-value](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) representation of data from various [internal and external sources](/sql-reference/statements/create/dictionary/sources#dictionary-sources), optimizing for super-low latency lookup queries.

Dictionaries are useful for:
- Improving the performance of queries, especially when used with `JOIN`s
Expand Down Expand Up @@ -86,7 +86,7 @@

#### Applying a dictionary {#applying-a-dictionary}

To demonstrate these concepts, we use a dictionary for our vote data. Since dictionaries are usually held in memory ([ssd_cache](/sql-reference/dictionaries#ssd_cache) is the exception), you should be cognizant of the size of the data. Confirming our `votes` table size:
To demonstrate these concepts, we use a dictionary for our vote data. Since dictionaries are usually held in memory ([ssd_cache](/sql-reference/statements/create/dictionary/layouts/ssd-cache) is the exception), you should be cognizant of the size of the data. Confirming our `votes` table size:

```sql
SELECT table,
Expand All @@ -104,7 +104,7 @@

Data will be stored uncompressed in our dictionary, so we need at least 4GB of memory if we were to store all columns (we won't) in a dictionary. The dictionary will be replicated across our cluster, so this amount of memory needs to be reserved *per node*.

> In the example below the data for our dictionary originates from a ClickHouse table. While this represents the most common source of dictionaries, [a number of sources](/sql-reference/dictionaries#dictionary-sources) are supported including files, http and databases including [Postgres](/sql-reference/dictionaries#postgresql). As we'll show, dictionaries can be automatically refreshed providing an ideal way to ensure small datasets subject to frequent changes are available for direct joins.
> In the example below the data for our dictionary originates from a ClickHouse table. While this represents the most common source of dictionaries, [a number of sources](/sql-reference/statements/create/dictionary/sources#dictionary-sources) are supported including files, http and databases including [Postgres](/sql-reference/statements/create/dictionary/sources/postgresql). As we'll show, dictionaries can be automatically refreshed providing an ideal way to ensure small datasets subject to frequent changes are available for direct joins.

Check notice on line 107 in docs/dictionary/index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Specify the number or remove the phrase.

Our dictionary requires a primary key on which lookups will be performed. This is conceptually identical to a transactional database primary key and should be unique. Our above query requires a lookup on the join key - `PostId`. The dictionary should in turn be populated with the total of the up and down votes per `PostId` from our `votes` table. Here's the query to obtain this dictionary data:

Expand Down Expand Up @@ -316,20 +316,20 @@

### Choosing the Dictionary `LAYOUT` {#choosing-the-dictionary-layout}

The `LAYOUT` clause controls the internal data structure for the dictionary. A number of options exist and are documented [here](/sql-reference/dictionaries#ways-to-store-dictionaries-in-memory). Some tips on choosing the correct layout can be found [here](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse#choosing-a-layout).
The `LAYOUT` clause controls the internal data structure for the dictionary. A number of options exist and are documented [here](/sql-reference/statements/create/dictionary/layouts#ways-to-store-dictionaries-in-memory). Some tips on choosing the correct layout can be found [here](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse#choosing-a-layout).

Check notice on line 319 in docs/dictionary/index.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Specify the number or remove the phrase.

### Refreshing dictionaries {#refreshing-dictionaries}

We have specified a `LIFETIME` for the dictionary of `MIN 600 MAX 900`. LIFETIME is the update interval for the dictionary, with the values here causing a periodic reload at a random interval between 600 and 900s. This random interval is necessary in order to distribute the load on the dictionary source when updating on a large number of servers. During updates, the old version of a dictionary can still be queried, with only the initial load blocking queries. Note that setting `(LIFETIME(0))` prevents dictionaries from updating.
Dictionaries can be forcibly reloaded using the `SYSTEM RELOAD DICTIONARY` command.

For database sources such as ClickHouse and Postgres, you can set up a query that will update the dictionaries only if they really changed (the response of the query determines this), rather than at a periodic interval. Further details can be found [here](/sql-reference/dictionaries#refreshing-dictionary-data-using-lifetime).
For database sources such as ClickHouse and Postgres, you can set up a query that will update the dictionaries only if they really changed (the response of the query determines this), rather than at a periodic interval. Further details can be found [here](/sql-reference/statements/create/dictionary/lifetime#refreshing-dictionary-data-using-lifetime).

### Other dictionary types {#other-dictionary-types}

ClickHouse also supports [Hierarchical](/sql-reference/dictionaries#hierarchical-dictionaries), [Polygon](/sql-reference/dictionaries#polygon-dictionaries) and [Regular Expression](/sql-reference/dictionaries#regexp-tree-dictionary) dictionaries.
ClickHouse also supports [Hierarchical](/sql-reference/statements/create/dictionary/layouts/hierarchical), [Polygon](/sql-reference/statements/create/dictionary/layouts/polygon) and [Regular Expression](/sql-reference/statements/create/dictionary/layouts/regexp-tree) dictionaries.

### More reading {#more-reading}

- [Using Dictionaries to Accelerate Queries](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse)
- [Advanced Configuration for Dictionaries](/sql-reference/dictionaries)
- [Advanced Configuration for Dictionaries](/sql-reference/statements/create/dictionary)
2 changes: 1 addition & 1 deletion docs/getting-started/example-datasets/cell-towers.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,7 @@ SELECT mcc, count() FROM cell_towers GROUP BY mcc ORDER BY count() DESC LIMIT 10

Based on the above query and the [MCC list](https://en.wikipedia.org/wiki/Mobile_country_code), the countries with the most cell towers are: the USA, Germany, and Russia.

You may want to create a [Dictionary](../../sql-reference/dictionaries/index.md) in ClickHouse to decode these values.
You may want to create a [Dictionary](../../sql-reference/statements/create/dictionary/index.md) in ClickHouse to decode these values.

## Use case: incorporate geo data {#use-case}

Expand Down
2 changes: 1 addition & 1 deletion docs/integrations/data-sources/cassandra.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ doc_type: 'reference'

# Cassandra integration

You can integrate with Cassandra via a dictionary. Further details [here](/sql-reference/dictionaries#cassandra).
You can integrate with Cassandra via a dictionary. Further details [here](/sql-reference/statements/create/dictionary/sources/cassandra).
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ Here's an excerpt from the CSV file you're using in table format. The
Setting `LIFETIME` to 0 disables automatic updates to avoid unnecessary
traffic to our S3 bucket. In other cases, you might configure it
differently. For details, see [Refreshing dictionary data using
LIFETIME](/sql-reference/dictionaries#refreshing-dictionary-data-using-lifetime).
LIFETIME](/sql-reference/statements/create/dictionary/lifetime#refreshing-dictionary-data-using-lifetime).
:::

2. Now import it:
Expand Down Expand Up @@ -642,13 +642,13 @@ should only retrieve the columns you actually need.
"PostgreSQL Client Applications: psql"
[EXPLAIN]: https://www.postgresql.org/docs/current/sql-explain.html
"SQL Commands: EXPLAIN"
[dictionary]: /sql-reference/dictionaries/index.md
[dictionary]: /sql-reference/statements/create/dictionary
[PGXN]: https://pgxn.org/dist/pg_clickhouse "pg_clickhouse on PGXN"
[GitHub]: https://github.com/ClickHouse/pg_clickhouse/releases
"pg_clickhouse Releases on GitHub"
[pg_clickhouse image]: https://github.com/ClickHouse/pg_clickhouse/pkgs/container/pg_clickhouse
"pg_clickhouse OCI Image on GitHub"
[Postgres image]: https://hub.docker.com/_/postgres
"Postgres OCI Image on Docker Hub"
[Refreshing dictionary data using LIFETIME]: /sql-reference/dictionaries/index.md#refreshing-dictionary-data-using-lifetime
[Refreshing dictionary data using LIFETIME]: /sql-reference/statements/create/dictionary/lifetime#refreshing-dictionary-data-using-lifetime
"ClickHouse Doc: Refreshing dictionary data using LIFETIME"
4 changes: 2 additions & 2 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@

## Create a dictionary {#create-a-dictionary}

A dictionary is a mapping of key-value pairs stored in memory. For details, see [Dictionaries](/sql-reference/dictionaries/index.md)
A dictionary is a mapping of key-value pairs stored in memory. For details, see [Dictionaries](/sql-reference/statements/create/dictionary)

Check warning on line 368 in docs/tutorial.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.EOLWhitespace

Remove whitespace characters from the end of the line.

Create a dictionary associated with a table in your ClickHouse service.
The table and dictionary are based on a CSV file that contains a row for each neighborhood in New York City.
Expand Down Expand Up @@ -398,7 +398,7 @@
```

:::note
Setting `LIFETIME` to 0 disables automatic updates to avoid unnecessary traffic to our S3 bucket. In other cases, you might configure it differently. For details, see [Refreshing dictionary data using LIFETIME](/sql-reference/dictionaries#refreshing-dictionary-data-using-lifetime).
Setting `LIFETIME` to 0 disables automatic updates to avoid unnecessary traffic to our S3 bucket. In other cases, you might configure it differently. For details, see [Refreshing dictionary data using LIFETIME](/sql-reference/statements/create/dictionary/lifetime#refreshing-dictionary-data-using-lifetime).
:::

3. Verify it worked. The following should return 265 rows, or one row for each neighborhood:
Expand Down
10 changes: 5 additions & 5 deletions docs/use-cases/observability/build-your-own/schema-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -578,7 +578,7 @@

## Using dictionaries {#using-dictionaries}

[Dictionaries](/sql-reference/dictionaries) are a [key feature](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse) of ClickHouse providing in-memory [key-value](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) representation of data from various internal and external [sources](/sql-reference/dictionaries#dictionary-sources), optimized for super-low latency lookup queries.
[Dictionaries](/sql-reference/statements/create/dictionary) are a [key feature](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse) of ClickHouse providing in-memory [key-value](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) representation of data from various internal and external [sources](/sql-reference/statements/create/dictionary/sources#dictionary-sources), optimized for super-low latency lookup queries.

<Image img={observability_12} alt="Observability and dictionaries" size="md"/>

Expand Down Expand Up @@ -715,7 +715,7 @@
FROM geoip_url
```

In order to perform low-latency IP lookups in ClickHouse, we'll leverage dictionaries to store key -> attributes mapping for our Geo IP data in-memory. ClickHouse provides an `ip_trie` [dictionary structure](/sql-reference/dictionaries#ip_trie) to map our network prefixes (CIDR blocks) to coordinates and country codes. The following query specifies a dictionary using this layout and the above table as the source.
In order to perform low-latency IP lookups in ClickHouse, we'll leverage dictionaries to store key -> attributes mapping for our Geo IP data in-memory. ClickHouse provides an `ip_trie` [dictionary structure](/sql-reference/statements/create/dictionary/layouts/ip-trie) to map our network prefixes (CIDR blocks) to coordinates and country codes. The following query specifies a dictionary using this layout and the above table as the source.

Check notice on line 718 in docs/use-cases/observability/build-your-own/schema-design.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Wordy

Suggestion: Remove 'in order' and leave 'to'.

```sql
CREATE DICTIONARY ip_trie (
Expand Down Expand Up @@ -825,10 +825,10 @@

The parsing of [user agent strings](https://en.wikipedia.org/wiki/User_agent) is a classical regular expression problem and a common requirement in log and trace based datasets. ClickHouse provides efficient parsing of user agents using Regular Expression Tree Dictionaries.

Regular expression tree dictionaries are defined in ClickHouse open-source using the YAMLRegExpTree dictionary source type which provides the path to a YAML file containing the regular expression tree. Should you wish to provide your own regular expression dictionary, the details on the required structure can be found [here](/sql-reference/dictionaries#use-regular-expression-tree-dictionary-in-clickhouse-open-source). Below we focus on user-agent parsing using [uap-core](https://github.com/ua-parser/uap-core) and load our dictionary for the supported CSV format. This approach is compatible with OSS and ClickHouse Cloud.
Regular expression tree dictionaries are defined in ClickHouse open-source using the YAMLRegExpTree dictionary source type which provides the path to a YAML file containing the regular expression tree. Should you wish to provide your own regular expression dictionary, the details on the required structure can be found [here](/sql-reference/statements/create/dictionary/layouts/regexp-tree#use-regular-expression-tree-dictionary-in-clickhouse-open-source). Below we focus on user-agent parsing using [uap-core](https://github.com/ua-parser/uap-core) and load our dictionary for the supported CSV format. This approach is compatible with OSS and ClickHouse Cloud.

Check warning on line 828 in docs/use-cases/observability/build-your-own/schema-design.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Quotes

Commas and periods go inside quotation marks.

:::note
In the examples below, we use snapshots of the latest uap-core regular expressions for user-agent parsing from June 2024. The latest file, which is occasionally updated, can be found [here](https://raw.githubusercontent.com/ua-parser/uap-core/master/regexes.yaml). You can follow the steps [here](/sql-reference/dictionaries#collecting-attribute-values) to load into the CSV file used below.
In the examples below, we use snapshots of the latest uap-core regular expressions for user-agent parsing from June 2024. The latest file, which is occasionally updated, can be found [here](https://raw.githubusercontent.com/ua-parser/uap-core/master/regexes.yaml). You can follow the steps [here](/sql-reference/statements/create/dictionary/layouts/regexp-tree#collecting-attribute-values) to load into the CSV file used below.
:::

Create the following Memory tables. These hold our regular expressions for parsing devices, browsers and operating systems.
Expand Down Expand Up @@ -1015,7 +1015,7 @@

- [Advanced dictionary topics](/dictionary#advanced-dictionary-topics)
- ["Using Dictionaries to Accelerate Queries"](https://clickhouse.com/blog/faster-queries-dictionaries-clickhouse)
- [Dictionaries](/sql-reference/dictionaries)
- [Dictionaries](/sql-reference/statements/create/dictionary)

## Accelerating queries {#accelerating-queries}

Expand Down
1 change: 1 addition & 0 deletions plugins/floating-pages-exceptions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,4 @@ interfaces/arrowflight
interfaces/overview
operations/utilities/clickhouse-keeper-http-api.md
operations/settings/detach-non-readonly-queries.md
sql-reference/dictionaries/index.md
4 changes: 2 additions & 2 deletions scripts/sed_links.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ if [[ "$OSTYPE" == "darwin"* ]]; then
sed -i '' 's|(https://clickhouse.com/docs/sql-reference/statements/select#apply)|(/sql-reference/statements/select)|g' docs/guides/developer/dynamic-column-selection.md
sed -i '' 's|(/sql-reference/statements/select#replace)|(/sql-reference/statements/select)|g' docs/guides/developer/dynamic-column-selection.md
sed -i '' 's|(/sql-reference/statements/select#except)|(/sql-reference/statements/select)|g' docs/guides/developer/dynamic-column-selection.md
sed -i '' 's|(/cloud/reference/cloud-compatibility.md)|(/whats-new/cloud-compatibility)|g' docs/sql-reference/dictionaries/_snippet_dictionary_in_cloud.md
sed -i '' 's|(/cloud/reference/cloud-compatibility.md)|(/whats-new/cloud-compatibility)|g' docs/sql-reference/statements/create/dictionary/_snippet_dictionary_in_cloud.md
sed -i '' 's|(/cloud/security/secure-s3)|(/cloud/data-sources/secure-s3)|g' docs/engines/table-engines/integrations/s3queue.md
sed -i '' 's|(/cloud/security/cloud-access-management/overview#initial-settings)|(/cloud/security/console-roles)|g' docs/sql-reference/statements/grant.md
sed -i '' 's|(/cloud/security/secure-s3#access-your-s3-bucket-with-the-clickhouseaccess-role)|(/cloud/data-sources/secure-s3#access-your-s3-bucket-with-the-clickhouseaccess-role)|g' docs/sql-reference/table-functions/s3.md
Expand All @@ -32,7 +32,7 @@ else
sed -i 's|(https://clickhouse.com/docs/sql-reference/statements/select#apply)|(/sql-reference/statements/select)|g' docs/guides/developer/dynamic-column-selection.md
sed -i 's|(/sql-reference/statements/select#replace)|(/sql-reference/statements/select)|g' docs/guides/developer/dynamic-column-selection.md
sed -i 's|(/sql-reference/statements/select#except)|(/sql-reference/statements/select)|g' docs/guides/developer/dynamic-column-selection.md
sed -i 's|(/cloud/reference/cloud-compatibility.md)|(/whats-new/cloud-compatibility)|g' docs/sql-reference/dictionaries/_snippet_dictionary_in_cloud.md
sed -i 's|(/cloud/reference/cloud-compatibility.md)|(/whats-new/cloud-compatibility)|g' docs/sql-reference/statements/create/dictionary/_snippet_dictionary_in_cloud.md
sed -i 's|(/cloud/security/secure-s3)|(/cloud/data-sources/secure-s3)|g' docs/engines/table-engines/integrations/s3queue.md
sed -i 's|(/cloud/security/cloud-access-management/overview#initial-settings)|(/cloud/security/console-roles)|g' docs/sql-reference/statements/grant.md
sed -i 's|(/cloud/security/secure-s3#access-your-s3-bucket-with-the-clickhouseaccess-role)|(/cloud/data-sources/secure-s3#access-your-s3-bucket-with-the-clickhouseaccess-role)|g' docs/sql-reference/table-functions/s3.md
Expand Down
2 changes: 1 addition & 1 deletion sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -1241,7 +1241,7 @@ const sidebars = {
label: 'Dictionary',
collapsible: true,
collapsed: true,
items: ['dictionary/index', 'sql-reference/dictionaries/index'],
items: ['dictionary/index'],
},
{
type: 'category',
Expand Down
1 change: 0 additions & 1 deletion src/theme/badges/CloudNotSupportedBadge/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ const Icon = () => {
<path strokeWidth="1.5" d="M6.33366 12.6666L12.3739 12.6667C13.6593 12.6667 14.7073 11.6187 14.7073 10.3334C14.7073 9.04804 13.6593 8.00003 12.3739 8.00003C12.3739 8.00003 12.3337 7.66659 12.0003 7.33325M10.667 5.33322C8.00033 2.33325 4.45395 4.78537 4.14195 6.68203C2.55728 6.7627 1.29395 8.06203 1.29395 9.6667C1.29395 11.3234 2.66699 12.6666 4.00033 12.6666" stroke="#660099" strokeLinecap="round" strokeLinejoin="round"/>
<path strokeWidth="1.5" d="M2.66699 14L12.0003 4.66663" stroke="#660099" strokeLinecap="round" strokeLinejoin="round"/>
</svg>

</div>
)
}
Expand Down
Loading