Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@
* [Defining flat views with view definitions](modules/sql-on-fhir/defining-flat-views-with-view-definitions.md)
* [Migrate to the spec-compliant ViewDefinition format](modules/sql-on-fhir/migrate-to-the-spec-compliant-viewdefinition-format.md)
* [Query data from flat views](modules/sql-on-fhir/query-data-from-flat-views.md)
* [De-identification](modules/sql-on-fhir/de-identification.md)
* [Reference](modules/sql-on-fhir/reference.md)
* [Integration Toolkit](modules/integration-toolkit/README.md)
* [C-CDA / FHIR Converter](modules/integration-toolkit/ccda-converter/README.md)
Expand Down
Binary file added assets/deident-viewdefinition-builder.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 8 additions & 2 deletions docs/modules/sql-on-fhir/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ description: Create flat SQL views from FHIR resources using ViewDefinitions for

# SQL on FHIR

{% hint style="info" %}
**SQL on FHIR** engine is currently in **preview**
{% hint style="warning" %}
Starting from version **2604**, SQL on FHIR requires **fhir-schema mode** (`fhir.validation.fhir-schema-validation=true`). ViewDefinitions are stored in the FHIR Artifact Registry (FAR), which is only available in fhir-schema mode. Without it, ViewDefinition CRUD, `$run`, `$sql`, and `$materialize` operations will not work.
{% endhint %}

Performing analysis on FHIR data requires extracting data from deeply nested structures of resources, which may be cumbersome in some cases. To address this problem, Aidbox implements [SQL on FHIR](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/index.html) specification allowing users to create flat views of their resources in a simple, straightforward way
Expand All @@ -22,6 +22,12 @@ Once your flat view is defined and materialized, you can query data from it usin

See [Query data from flat views](./query-data-from-flat-views.md).

## De-identification

Starting from version **2604**, ViewDefinition columns can be annotated with de-identification methods to transform sensitive data during SQL generation. Supported methods include redact, cryptoHash, dateshift, encrypt, substitute, perturb, and custom PostgreSQL functions.

See [De-identification](./de-identification.md).

## SQL on FHIR reference

To dive deeper into the nuances of using SQL on FHIR in Aidbox, consult the reference page.
Expand Down
343 changes: 343 additions & 0 deletions docs/modules/sql-on-fhir/de-identification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,343 @@
# De-identification

Starting from version **2604**, Aidbox supports per-column de-identification in ViewDefinitions via a FHIR extension. When a column has a de-identification extension, the SQL compiler wraps the column expression with a PostgreSQL function that transforms the value before it reaches the output.

This works with all ViewDefinition operations: `$run`, `$sql`, and `$materialize`.

{% hint style="info" %}
Requires **fhir-schema mode**.
{% endhint %}

## Extension format

Add the de-identification extension to any column in the `select` array:

```json
{
"name": "birth_date",
"path": "birthDate",
"extension": [
{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [
{"url": "method", "valueCode": "dateshift"},
{"url": "dateShiftKey", "valueString": "my-secret-key"}
]
}
]
}
```

The extension uses sub-extensions for the method and its parameters. The `method` sub-extension is required and specifies which de-identification method to apply.

## Methods

### redact

Replaces the value with NULL. No parameters.

```json
{"url": "method", "valueCode": "redact"}
```

### cryptoHash

Replaces the value with its HMAC-SHA256 hash (hex-encoded). Deterministic — same input always produces the same hash. One-way, cannot be reversed.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| cryptoHashKey | string | yes | HMAC secret key |

```json
[
{"url": "method", "valueCode": "cryptoHash"},
{"url": "cryptoHashKey", "valueString": "my-hash-key"}
]
```

### dateshift

Shifts date and dateTime values by a deterministic offset derived from the resource id. All dates within the same resource shift by the same number of days, preserving temporal relationships. The offset range is -50 to +50 days.

Year-only values (`"2000"`) and year-month values (`"2000-06"`) cannot be shifted meaningfully and are replaced with NULL.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| dateShiftKey | string | yes | HMAC key used to compute the per-resource offset |

```json
[
{"url": "method", "valueCode": "dateshift"},
{"url": "dateShiftKey", "valueString": "my-shift-key"}
]
```

### birthDateSafeHarbor

Intended **only for `Patient.birthDate`**. Behaves like `dateshift` but returns NULL when the birth date implies the patient is over 89 years old, per HIPAA Safe Harbor rule [45 CFR 164.514(b)(2)(i)(C)](https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-C/part-164/subpart-E/section-164.514).

Applying this method to any other date column is semantically incorrect — the function computes `age(current_date, input)` and treats the input as a birth date. Use plain `dateshift` for non-birth-date fields.

Because the function depends on `current_date`, it is marked [`STABLE`](https://www.postgresql.org/docs/current/xfunc-volatility.html) rather than `IMMUTABLE` — PostgreSQL guarantees it returns the same result within a single transaction, but the result may differ between transactions as the current date changes. This means the age cutoff re-evaluates on every query.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| dateShiftKey | string | yes | HMAC key used to compute the per-resource offset |

```json
[
{"url": "method", "valueCode": "birthDateSafeHarbor"},
{"url": "dateShiftKey", "valueString": "my-shift-key"}
]
```

### encrypt

AES-128-CBC encrypts the value and returns a base64-encoded string. Reversible with the key. Uses a zero initialization vector for deterministic output — same plaintext always produces the same ciphertext.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| encryptKey | string | yes | Hex-encoded AES-128 key (32 hex characters = 16 bytes) |

```json
[
{"url": "method", "valueCode": "encrypt"},
{"url": "encryptKey", "valueString": "0123456789abcdef0123456789abcdef"}
]
```

### substitute

Replaces the value with a fixed string.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| replaceWith | string | yes | Replacement value |

```json
[
{"url": "method", "valueCode": "substitute"},
{"url": "replaceWith", "valueString": "REDACTED"}
]
```

### perturb

Adds random noise to numeric values. The result is non-deterministic — each query produces different output.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| span | decimal | no | Noise magnitude. Default: 1.0 |
| rangeType | code | no | `fixed` (absolute noise) or `proportional` (relative to value). Default: `fixed` |
| roundTo | integer | no | Decimal places to round to. 0 means integer. Default: 0 |

With `fixed` range type, noise is in the range `±span/2`. With `proportional`, noise is `±(span × value)/2`. Any other `rangeType` value raises a SQL error.

```json
[
{"url": "method", "valueCode": "perturb"},
{"url": "span", "valueDecimal": 10},
{"url": "rangeType", "valueCode": "fixed"},
{"url": "roundTo", "valueInteger": 0}
]
```

### custom_function

Applies a user-provided PostgreSQL function. The function must already exist in the database. Its first argument is the column value cast to text. An optional second argument can be passed via `custom_arg`.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| custom_function | string | yes | PostgreSQL function name. Must match `^[a-zA-Z][a-zA-Z0-9_.]*$` |
| custom_arg | any primitive | no | Optional second argument, passed as a FHIR sub-extension using the appropriate `value[x]` type: `valueString`, `valueInteger`, `valueDecimal`, `valueBoolean`, or `valueCode` |

```json
[
{"url": "method", "valueCode": "custom_function"},
{"url": "custom_function", "valueString": "left"},
{"url": "custom_arg", "valueInteger": 4}
]
```

This example uses the built-in PostgreSQL `left` function to keep only the first 4 characters (e.g. extracting just the year from a date string).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add here "See also: [Writing custom PostgreSQL functions](#writing-custom-postgresql-functions)"

## Example ViewDefinition
Comment thread
andreyorst marked this conversation as resolved.

A complete ViewDefinition that de-identifies Patient data:

```json
{
"resourceType": "ViewDefinition",
"id": "deident-patients",
"name": "deident_patients",
"status": "active",
"resource": "Patient",
"select": [{
"column": [{
"name": "id",
"path": "id",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "cryptoHash"
}, {
"url": "cryptoHashKey",
"valueString": "patient-hash-key"
}]
}]
}, {
"name": "gender",
"path": "gender"
}, {
"name": "birth_date",
"path": "birthDate",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "dateshift"
}, {
"url": "dateShiftKey",
"valueString": "date-shift-key"
}]
}]
}]
}, {
"forEachOrNull": "name",
"select": [{
"column": [{
"name": "family",
"path": "family",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "redact"
}]
}]
}]
}]
}, {
"forEachOrNull": "address",
"select": [{
"column": [{
"name": "postal_code",
"path": "postalCode",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "substitute"
}, {
"url": "replaceWith",
"valueString": "000"
}]
}]
}]
}]
}]
}
```

In this example:

- `id` is replaced with a consistent hash
- `gender` passes through unchanged (no extension)
- `birthDate` is shifted by a deterministic offset per patient
- `name.family` is redacted (NULL)
- `address.postalCode` is replaced with "000"

The result from the `$run` operation would look like this:

| `id` | `gender` | `birth_date` | `family` | `postal_code` |
|------------------------------------------------------------------|----------|--------------|----------|---------------|
| a9c063ce560ab35c2156d4bf153457d8c7b0ad6325c1c4112b34eb7147aaa8f9 | female | 1985-03-02 | null | 000 |
| 6e7dfba4a51c359ead0afd9e3ff542c9417505957bf374e510eb37ec020fbc12 | male | 1952-11-12 | null | 000 |
| 27fb6fd29c5657c1a122aa1ae28cdfc5e10b202c6dc7d498cec72609b3a1b447 | female | 1930-05-08 | null | 000 |

## Materialization restriction

A ViewDefinition that contains any de-identification extension can only be materialized as a `table`. Attempting to materialize as `view` or `materialized-view` returns HTTP 422 with an OperationOutcome:

> ViewDefinitions with de-identification extensions can only be materialized as 'table'. Views and materialized views expose cryptographic keys in PostgreSQL system catalogs.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this line because the next sentence already explains the restriction


This restriction exists because PostgreSQL stores the full view definition (including the compiled SQL with embedded keys) in `pg_views.definition` and `pg_matviews.definition`. Any user with `SELECT` on those catalogs would see the `cryptoHashKey`, `dateShiftKey`, or `encryptKey` values in plaintext. Tables materialize the transformed data only, leaving the keys inside the ViewDefinition resource itself (which is access-controlled).

`$run` and `$sql` are unaffected — they return data or SQL strings directly without storing anything in system catalogs.

## Pre-built ViewDefinitions

The IG package `io.health-samurai.de-identification.r4` provides ready-made Safe Harbor ViewDefinitions for common FHIR R4 resource types. Install it via FAR (Aidbox's artifact registry):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to do it? Can't find it. Can you specify it here?

Image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FAR is not used widely in docs, and in UI, it is not FAR anymore.

"via FAR (Aidbox's artifact registry)" -> "via [Artefact Registry](relative-path/artifact-registry/artifact-registry-overview) ("FHIR packages" in Aidbox UI)"


| Resource | Use |
|----------|-----|
| Patient | Uses `birthDateSafeHarbor` on `birthDate`, cryptoHash on `id`, redact on name/address identifiers |
| Encounter, Condition, Observation | `dateshift` on clinical dates, cryptoHash on references |
| Claim, ExplanationOfBenefit | `dateshift` on billable periods |
| AllergyIntolerance, DiagnosticReport, MedicationRequest, MedicationDispense, MedicationAdministration, Immunization, Procedure, Specimen, DocumentReference | Same general approach |
| Practitioner, Location | Identifier redaction |

Install the package via FHIR package management and use these ViewDefinitions directly, or copy and customize them. Every cryptographic key parameter in the pre-built VDs is blank (`""`) — you must set real keys before using them for actual de-identification.

## Using the UI
Comment thread
andreyorst marked this conversation as resolved.

The ViewDefinition builder in Aidbox UI includes a de-identification picker on each column. Click the shield icon next to a column's path to open the configuration popover.

![De-identification picker in the ViewDefinition Builder](../../../assets/deident-viewdefinition-builder.png)

When a de-identification method is configured, the shield icon turns blue. Hovering shows the current method name.

## Writing custom PostgreSQL functions

Custom functions referenced via `custom_function` must:

- Accept `text` as the first argument (the column value)
- Optionally accept a second argument of any type (passed via `custom_arg`)
- Already exist in the database before the ViewDefinition is executed

Example:

```sql
CREATE OR REPLACE FUNCTION my_mask(value text)
RETURNS text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS $$
SELECT CASE
WHEN value IS NULL THEN NULL
ELSE left(value, 1) || repeat('*', greatest(length(value) - 1, 0))
END;
$$;
```

Then reference it in a column:

```json
{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [
{"url": "method", "valueCode": "custom_function"},
{"url": "custom_function", "valueString": "my_mask"}
]
}
```

## Security considerations
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much here. We can split "key management" and "encryption limitations" content into hints in "## encrypt" section.


### Key management

Cryptographic keys (`cryptoHashKey`, `dateShiftKey`, `encryptKey`) are stored as plaintext strings inside the ViewDefinition resource. Anyone with read access to the ViewDefinition can see the keys.
Comment thread
spicyfalafel marked this conversation as resolved.

Restrict access to ViewDefinition resources using [AccessPolicy](../../access-control/authorization/README.md) to ensure only authorized users can view or modify de-identification configurations.

### SQL injection prevention
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't write about SQL injection prevention in the docs before. It is cool that we do it, but we don't have to point to it. Maybe it's just me, but I think it is redundant here.


The `custom_function` parameter is validated against `^[a-zA-Z][a-zA-Z0-9_.]*$` — only letters, digits, underscores, and dots are allowed. This validation happens both in the Aidbox UI and in the SQL compiler. String arguments passed via `custom_arg` are safely escaped by the SQL generator.

### Encryption limitations

The `encrypt` method uses AES-128-CBC with a zero initialization vector. This makes encryption deterministic — the same plaintext always produces the same ciphertext, which is useful for consistent de-identification but leaks frequency information. This is not suitable for general-purpose encryption.

See also:

- [Defining flat views with view definitions](./defining-flat-views-with-view-definitions.md)
- [$run operation](./operation-run.md)
- [$materialize operation](./operation-materialize.md)
2 changes: 1 addition & 1 deletion docs/modules/sql-on-fhir/operation-materialize.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: Materializing SQL-on-FHIR ViewDefinitions as database tables, views
{% endhint %}

{% hint style="warning" %}
When running Aidbox not in FHIRSchema mode, please be aware that input parameters for the `$materialize` operation are not validated against FHIR specifications.
Requires **fhir-schema mode**.
{% endhint %}

{% hint style="info" %}
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/sql-on-fhir/operation-run.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Using the $run operation to execute SQL-on-FHIR ViewDefinitions
# $run operation

{% hint style="warning" %}
When running Aidbox not in FHIRSchema mode, please be aware that input parameters for the `$run` operation are not validated against FHIR specifications.
Requires **fhir-schema mode**.
{% endhint %}

{% hint style="info" %}
Expand Down
Loading