Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 71 additions & 94 deletions documentation/query/sql/sample-by.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,29 @@ sidebar_label: SAMPLE BY
description: SAMPLE BY SQL keyword reference documentation.
---

`SAMPLE BY` is used on [time-series data](/blog/what-is-time-series-data/) to summarize large datasets into
aggregates of homogeneous time chunks as part of a
[SELECT statement](/docs/query/sql/select/).
`SAMPLE BY` aggregates time-series data into time-based groups (hourly, daily, etc.) and applies aggregate functions to each group. It is QuestDB's primary tool for downsampling and summarizing large datasets. Supports interpolation for missing intervals via `FILL`.

To use `SAMPLE BY`, a table column needs to be specified as a
[designated timestamp](/docs/concepts/designated-timestamp/).

Users performing `SAMPLE BY` queries on datasets **with missing data** may make
use of the [FILL](#fill-options) keyword to specify a fill behavior.
Requires a [designated timestamp](/docs/concepts/designated-timestamp/) column.

## Syntax

### SAMPLE BY keywords

![Flow chart showing the syntax of the SAMPLE BY keywords](/images/docs/diagrams/sampleBy.svg)

### FROM-TO keywords

![Flow chart showing the syntax of the FROM-TO keywords](/images/docs/diagrams/fromTo.svg)

### FILL keywords

![Flow chart showing the syntax of the FILL keyword](/images/docs/diagrams/fill.svg)
```questdb-sql
SELECT columns, aggregate_functions
FROM table
[WHERE conditions]
SAMPLE BY <n><unit>
[FROM timestamp TO timestamp]
[FILL(fill_option)]
[ALIGN TO CALENDAR [TIME ZONE tz] [WITH OFFSET offset] | ALIGN TO FIRST OBSERVATION]
```

### ALIGN TO keywords
### Key clauses

![Flow chart showing the syntax of the ALIGN TO keywords](/images/docs/diagrams/alignToCalTimeZone.svg)
- **`SAMPLE BY <n><unit>`**: Groups rows into time buckets (e.g., `1h`, `5m`, `1d`)
- **`FROM ... TO`**: Defines explicit time boundaries for filling missing intervals
- **`FILL(option)`**: Specifies how to handle missing time buckets (`NONE`, `NULL`, `PREV`, `LINEAR`, or constant)
- **`ALIGN TO CALENDAR`**: Aligns buckets to calendar boundaries (default). Supports `TIME ZONE` and `WITH OFFSET`
- **`ALIGN TO FIRST OBSERVATION`**: Aligns buckets relative to the first row's timestamp

## Sample units

Expand All @@ -44,6 +40,7 @@ Where the unit for sampled groups may be one of the following:

| unit | description |
| ---- | ----------- |
| `n` | nanosecond |
| `U` | microsecond |
| `T` | millisecond |
| `s` | second |
Expand All @@ -63,74 +60,19 @@ FROM trades
SAMPLE BY 1h;
```

## FROM-TO

:::note

Versions prior to QuestDB 8.1.0 do not have access to this extension.

Please see the new blog for more information.

:::

When using `SAMPLE BY` with `FILL`, you can fill missing rows within the result set with pre-determined values.

However, this method will only fill rows between existing data in the data set and cannot fill rows outside of this range.

To fill outside the bounds of the existing data, you can specify a fill range using a `FROM-TO` clause. The boundary
timestamps are expected in UTC.

Note that `FROM-TO` clause can be used only on non-keyed SAMPLE BY queries, i.e. queries that have no grouping columns
other than the timestamp.

#### Syntax

Specify the shape of the query using `FROM` and `TO`:

```questdb-sql title='Pre-filling trade data' demo
SELECT timestamp as ts, count()
FROM trades
SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL);
```

If no rows exist at the start of the range, QuestDB automatically fills in these rows.

This is distinct from the `WHERE` clause with a simple rule of thumb -
`WHERE` controls what data flows in, `FROM-TO` controls what data flows out.

Use both `FROM` and `TO` in isolation to pre-fill or post-fill data. If `FROM` is not provided, then the lower bound is the start of the dataset, aligned to calendar. The opposite is true omitting `TO`.

#### `WHERE` clause optimisation

If the user does not provide a `WHERE` clause, or the `WHERE` clause does not consider the designated timestamp,
QuestDB will add one for you, matching the `FROM-TO` interval.

This means that the query will run optimally, and avoid touching data not relevant to the result.

Therefore, we compile the prior query into something similar to this:

```questdb-sql title='Pre-filling trade data with WHERE optimisation' demo
SELECT timestamp as ts, count()
FROM trades
WHERE timestamp >= '2009-01-01'
AND timestamp < '2009-01-10'
SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL);
```

#### Limitations

Here are the current limits to this feature.
## Fill options

- This syntax is not compatible with `FILL(PREV)` or `FILL(LINEAR)`.
- This syntax is for `ALIGN TO CALENDAR` only (default alignment).
- Does not consider any specified `OFFSET`.
- This syntax is for non-keyed `SAMPLE BY` i.e. only designated timestamp and aggregate columns.
`FILL` specifies how to handle time intervals with no data. By default, missing intervals are skipped. Use `FILL` to interpolate or substitute values.

## Fill options
| fillOption | Description |
| ---------- | ------------------------------------------------------------------------------------------------------------------------- |
| `NONE` | No fill applied. If there is no data, the time sample will be skipped in the results. A table could be missing intervals. |
| `NULL` | Fills with `NULL` values. |
| `PREV` | Fills using the previous value. |
| `LINEAR` | Fills by linear interpolation of the 2 surrounding points. |
| `x` | Fills with a constant value - where `x` is the desired value, for example `FILL(100.05)`. |

The `FILL` keyword is optional and expects one or more `fillOption` strategies
which will be applied to one or more aggregate columns. The following
restrictions apply:
The following restrictions apply:

- Keywords denoting fill strategies may not be combined. Only one option from
`NONE`, `NULL`, `PREV`, `LINEAR` and constants may be used.
Expand All @@ -146,14 +88,6 @@ SAMPLE BY 1h FILL(LINEAR)
ALIGN TO ...
```

| fillOption | Description |
| ---------- | ------------------------------------------------------------------------------------------------------------------------- |
| `NONE` | No fill applied. If there is no data, the time sample will be skipped in the results. A table could be missing intervals. |
| `NULL` | Fills with `NULL` values. |
| `PREV` | Fills using the previous value. |
| `LINEAR` | Fills by linear interpolation of the 2 surrounding points. |
| `x` | Fills with a constant value - where `x` is the desired value, for example `FILL(100.05)`. |

Consider an example table named `prices` which has no records during the entire
third hour (`2021-01-01T03`):

Expand Down Expand Up @@ -263,6 +197,49 @@ In the above query `min(price)` aggregate will get `FILL(NULL)` strategy
applied, `max(price)` will get `FILL(10)`, and `avg(price)` will get
`FILL(PREV)`.

## FROM-TO

`FILL` only fills intervals between existing data points. To fill outside the bounds of existing data, use `FROM-TO` to specify explicit time boundaries.

```questdb-sql title='Pre-filling trade data' demo
SELECT timestamp as ts, count()
FROM trades
SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL);
```

If no rows exist at the start of the range, QuestDB automatically fills in these rows.

This is distinct from the `WHERE` clause with a simple rule of thumb -
`WHERE` controls what data flows in, `FROM-TO` controls what data flows out.

Use both `FROM` and `TO` in isolation to pre-fill or post-fill data. If `FROM` is not provided, then the lower bound is the start of the dataset, aligned to calendar. The opposite is true omitting `TO`.

The boundary timestamps are expected in UTC. `FROM-TO` can only be used on non-keyed SAMPLE BY queries (queries with no grouping columns other than the timestamp).

### `WHERE` clause optimisation

If the user does not provide a `WHERE` clause, or the `WHERE` clause does not consider the designated timestamp,
QuestDB will add one for you, matching the `FROM-TO` interval.

This means that the query will run optimally, and avoid touching data not relevant to the result.

Therefore, we compile the prior query into something similar to this:

```questdb-sql title='Pre-filling trade data with WHERE optimisation' demo
SELECT timestamp as ts, count()
FROM trades
WHERE timestamp >= '2009-01-01'
AND timestamp < '2009-01-10'
SAMPLE BY 1d FROM '2009-01-01' TO '2009-01-10' FILL(NULL);
```

### Limitations

- Not compatible with `FILL(PREV)` or `FILL(LINEAR)`.
- Only works with `ALIGN TO CALENDAR` (default alignment).
- Does not consider any specified `OFFSET`.
- Only for non-keyed `SAMPLE BY` (designated timestamp and aggregate columns only). See [Fill keyed queries with arbitrary intervals](/docs/cookbook/sql/time-series/fill-keyed-arbitrary-interval/) for a workaround.

## Sample calculation

The default time calculation of sampled groups is an absolute value, in other
Expand Down
Loading