-
Notifications
You must be signed in to change notification settings - Fork 96
[doc] Move cleanup part from README to documentation #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
the-glu
wants to merge
10
commits into
interuss:master
Choose a base branch
from
Orbitalize:doc_clean
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
7c924a1
[doc] Move cleanup part from README to documentation
the-glu 5090fa5
Update docs/operations/cleanup.md
the-glu 56ea17a
Update docs/operations/cleanup.md
the-glu b0a32f3
Update docs/operations/cleanup.md
the-glu 3d2df03
Update docs/operations/cleanup.md
the-glu c8a6a1c
Update docs/operations/cleanup.md
the-glu 4e3e8c4
Update docs/operations/cleanup.md
the-glu cc5593b
Update docs/operations/cleanup.md
the-glu cf00d60
Update docs/operations/cleanup.md
the-glu bd8a00f
Update docs/operations/cleanup.md
the-glu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,85 +1,3 @@ | ||
| # DB Cleanup | ||
|
|
||
| ## evict | ||
| CLI tool that lists and deletes expired entities in the DSS store. | ||
| At the time of writing this README, the entities supported by this tool are: | ||
| - SCD operational intents; | ||
| - SCD subscriptions; | ||
| - RID identification service areas; | ||
| - RID subscriptions. | ||
|
|
||
| The usage of this tool is potentially dangerous: inputting wrong parameters may result in loss of data. | ||
| As such it is strongly recommended to always review and validate the list of entities identified as expired, and to | ||
| ensure that a backup of the data is available before deleting anything using the `--delete` flag. | ||
|
|
||
| ### Performance impact | ||
| The current implementation of this tool might have a performance impact due notably to lock contention if the number of | ||
| entities to be removed is high. With the system under heavy load it might even fail to remove them. That is due to the | ||
| fact that the expired entities are all identified and removed within a single transaction: with concurrent competing | ||
| transactions succeeding faster, there might be enough failures so that the tool fails. There is no risk of data | ||
| inconsistency and the cleanup may just be tried again in that case. | ||
|
|
||
| To avoid this issue: | ||
| - perform the cleanup during a low intensity period (e.g. at night); | ||
| - iteratively cleanup the entities by starting with a lower TTL and progressively making it higher. | ||
|
|
||
| If this becomes enough of an issue in the future it could be considered implementing batching of removals. | ||
|
|
||
| ### Usage | ||
| Extract from running `db-manager evict --help`: | ||
| ``` | ||
| List and evict expired entities | ||
|
|
||
| Usage: | ||
| db-manager evict [flags] | ||
|
|
||
| Flags: | ||
| --delete set this flag to true to delete the expired entities | ||
| -h, --help help for evict | ||
| --locality string self-identification string of this DSS instance | ||
| --rid_isa set this flag to true to check for expired RID ISAs (default true) | ||
| --rid_sub set this flag to true to check for expired RID subscriptions (default true) | ||
| --rid_ttl duration time-to-live duration used for determining RID entries expiration, defaults to 30 minutes (default 30m0s) | ||
| --scd_oir set this flag to true to check for expired SCD operational intents (default true) | ||
| --scd_sub set this flag to true to check for expired SCD subscriptions (default true) | ||
| --scd_ttl duration time-to-live duration used for determining SCD entries expiration, defaults to 2*56 days (default 2688h0m0s) | ||
|
|
||
| Global Flags: | ||
| --datastore_application_name string application name for tagging the connection to the database (default "dss") | ||
| --datastore_host string database host to connect to | ||
| --datastore_max_conn_idle_secs int maximum amount of time in seconds a connection may be idle, default is 30 seconds (default 30) | ||
| --datastore_max_open_conns int maximum number of open connections to the database, default is 4 (default 4) | ||
| --datastore_max_retries int maximum number of attempts to retry a query in case of contention, default is 100 (default 100) | ||
| --datastore_port int database port to connect to (default 26257) | ||
| --datastore_ssl_dir string directory to ssl certificates. Must contain files: ca.crt, client.<user>.crt, client.<user>.key | ||
| --datastore_ssl_mode string database sslmode (default "disable") | ||
| --datastore_user string database user to authenticate as (default "root") | ||
|
|
||
| ``` | ||
|
|
||
| Do note: | ||
| - by default expired entities are only listed, not deleted, the flag `--delete` is required for deleting entities; | ||
| - expiration of entities is preferably determined through their end times, however when they do not have end times, the last update times are used; | ||
| - the flag `--rid_ttl` and `--scd_ttl` accepts durations formatted as [Go `time.Duration` strings](https://pkg.go.dev/time#ParseDuration), e.g. `24h`; | ||
| - the datastore cluster connection flags are the same as [the `core-service` command](../../core-service/README.md). | ||
|
|
||
| ### Examples | ||
| The following examples assume a running DSS deployed locally through [the `run_locally.sh` script](../../../build/dev/standalone_instance.md). | ||
|
|
||
| #### List all entities older than 1 week | ||
| ```shell | ||
| docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-manager evict \ | ||
| --datastore_host=local-dss-crdb --scd_ttl=168h --rid_ttl=168h | ||
| ``` | ||
|
|
||
| #### List operational intents older than 1 week | ||
| ```shell | ||
| docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-manager evict \ | ||
| --datastore_host=local-dss-crdb --scd_ttl=168h --scd_oir=true --scd_sub=false | ||
| ``` | ||
|
|
||
| #### Delete all entities older than 30 days | ||
| ```shell | ||
| docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-manager evict \ | ||
| --datastore_host=local-dss-crdb --scd_ttl=720h --rid_ttl=720h --delete | ||
| ``` | ||
| This documentation has been moved to [interuss.github.io/dss](https://interuss.github.io/dss/dev/operations/cleanup). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,179 @@ | ||
| # Database cleanup | ||
|
|
||
| Data will accumulate over time in the database if client USSs do not remove their expired entities, and this can lead to lower performance due to quantity of stale entities. For this reason, InterUSS recommends periodically cleaning up no-longer-relevant entities if USS clients do not always clean up after themselves. | ||
|
|
||
| This page describes how to clean up expired entities from the DSS datastore using the `db-manager evict` command. | ||
|
|
||
| ## Overview | ||
|
|
||
| The `evict` subcommand of `db-manager` is a CLI tool that lists and deletes expired entities in the DSS store. The following entity types are supported: | ||
|
|
||
| - SCD operational intents | ||
| - SCD subscriptions | ||
| - RID identification service areas (ISAs) | ||
| - RID subscriptions | ||
|
|
||
| By default, the tool only lists expired entities. Deletion is opt-in via the `--delete` flag. | ||
|
|
||
| !!! warning | ||
| Using this tool incorrectly may result in **loss of data**. Before deleting anything: | ||
|
|
||
| - Always review and validate the list of entities identified as expired (run the tool without `--delete` first). | ||
| - Ensure a backup of the data is available. | ||
| - Double-check the TTL values passed to `--rid_ttl` and `--scd_ttl`. | ||
|
|
||
| Expiration of entities is preferably determined through their end times. In the unusual event that an end time is not available, the last update time is used instead. | ||
|
|
||
| ## Why and when to run the cleanup | ||
|
|
||
| Expired entities (operational intents past their end time, stale subscriptions, ISAs whose lifetime has elapsed) can accumulate over time in the datastore if not cleaned up by clients. While the DSS keeps functioning correctly with some stale rows present, excessive accumulation can impact: | ||
|
|
||
| - Storage growth: unbounded storage usage in data stores (e.g., CockroachDB / YugabyteDB). | ||
| - Query performance: indexes get larger, range scans on `operational_intents` / `subscriptions` / `identification_service_areas` degrade. | ||
|
|
||
| There is no single correct interval or TTL. Reasonable values depend on context and must be defined per DSS pool, taking into account: traffic volume and entity churn, regulatory or contractual data-retention requirements applicable to your jurisdiction, the storage capacity of the datastore cluster, and how long clients may legitimately need to query historical entities. | ||
|
|
||
| The defaults shipped with the deployment tooling (`30m` TTL on RID running every 30 min, `2688h` ≈ 56-day TTL on SCD running nightly when enabled) are starting points, not recommendations. Validate production values against the criteria above. | ||
|
|
||
|
|
||
| ## Performance impact | ||
|
|
||
| All expired entities are identified and removed within a single transaction. When the system is under heavy load, lock contention with concurrent transactions may cause the cleanup to fail. There is no risk of data inconsistency in this case - the cleanup may simply be retried. | ||
|
|
||
| To mitigate this: | ||
|
|
||
| - Run the cleanup during low-intensity periods (e.g. at night). | ||
| - Clean up iteratively, starting with a lower TTL and progressively increasing it. | ||
| If this becomes a recurring issue, batching removals could be considered as a future improvement. | ||
|
|
||
| ## Usage | ||
|
|
||
| Extracted from `db-manager evict --help`: | ||
|
|
||
| ``` | ||
| List and evict expired entities | ||
|
|
||
| Usage: | ||
| db-manager evict [flags] | ||
|
|
||
| Flags: | ||
| --delete set this flag to true to delete the expired entities | ||
| -h, --help help for evict | ||
| --locality string self-identification string of this DSS instance | ||
| --rid_isa set this flag to true to check for expired RID ISAs (default true) | ||
| --rid_sub set this flag to true to check for expired RID subscriptions (default true) | ||
| --rid_ttl duration time-to-live duration used for determining RID entries expiration, defaults to 30 minutes (default 30m0s) | ||
| --scd_oir set this flag to true to check for expired SCD operational intents (default true) | ||
| --scd_sub set this flag to true to check for expired SCD subscriptions (default true) | ||
| --scd_ttl duration time-to-live duration used for determining SCD entries expiration, defaults to 2*56 days (default 2688h0m0s) | ||
|
|
||
| Global Flags: | ||
| --datastore_application_name string application name for tagging the connection to the database (default "dss") | ||
| --datastore_host string database host to connect to | ||
| --datastore_max_conn_idle_secs int maximum amount of time in seconds a connection may be idle, default is 30 seconds (default 30) | ||
| --datastore_max_open_conns int maximum number of open connections to the database, default is 4 (default 4) | ||
| --datastore_max_retries int maximum number of attempts to retry a query in case of contention, default is 100 (default 100) | ||
| --datastore_port int database port to connect to (default 26257) | ||
| --datastore_ssl_dir string directory to ssl certificates. Must contain files: ca.crt, client.<user>.crt, client.<user>.key | ||
| --datastore_ssl_mode string database sslmode (default "disable") | ||
| --datastore_user string database user to authenticate as (default "root") | ||
| ``` | ||
|
|
||
| Notes: | ||
|
|
||
| - By default, expired entities are only listed - `--delete` is required to actually remove them. | ||
| - `--rid_ttl` and `--scd_ttl` accept durations formatted as [Go `time.Duration` strings](https://pkg.go.dev/time#ParseDuration), e.g. `24h`. | ||
| - The datastore connection flags match those of the `core-service` command. | ||
|
|
||
| ## Regular cleanup | ||
|
|
||
| Beyond running `db-manager evict` manually, the DSS deployment tooling can schedule the cleanup as a recurring Kubernetes `CronJob`. Three deployment paths expose the same set of evict knobs, but will always run with the `--delete` flag set. | ||
|
|
||
| Shared default: RID cleanup is enabled by default (`*/30 * * * *`, `ttl = 30m`); SCD cleanup is disabled by default (suggested schedule `0 2 * * *`, `ttl = 2688h` - i.e. 2 x 56 days). | ||
|
|
||
| ### Helm | ||
|
|
||
| The `dss` chart includes a `dss-evict` CronJob (see `deploy/services/helm-charts/dss/templates/dss-evict.yaml`), configured under `dss.conf.evict` in `values.yaml`: | ||
|
|
||
| ``` | ||
| dss: | ||
| conf: | ||
| evict: | ||
| scd: | ||
| enableCron: false | ||
| schedule: "0 2 * * *" | ||
| ttl: 2688h | ||
| operationalIntents: true | ||
| subscriptions: true | ||
| rid: | ||
| enableCron: true | ||
| schedule: "*/30 * * * *" | ||
| ttl: 30m | ||
| ISAs: true | ||
| subscriptions: true | ||
| ``` | ||
|
|
||
| ### Tanka | ||
|
|
||
| Configure it under the `evict` key of your environment metadata: | ||
|
|
||
| ```jsonnet | ||
| evict+: { | ||
| scd+: { | ||
| enable_cron: false, | ||
| schedule: "0 2 * * *", | ||
| ttl: "2688h", | ||
| operational_intents: true, | ||
| subscriptions: true, | ||
| }, | ||
| rid+: { | ||
| enable_cron: true, | ||
| schedule: "*/30 * * * *", | ||
| ttl: "30m", | ||
| ISAs: true, | ||
| subscriptions: true, | ||
| }, | ||
| }, | ||
| ``` | ||
|
|
||
| ### Terraform | ||
|
|
||
| When deploying via terrafrom modules, the parameters are configurable with module variables: | ||
|
|
||
| | Terraform variable | Default | | ||
| |---------------------------------|------------------| | ||
| | `evict_enable_scd_cron` | `false` | | ||
| | `evict_scd_schedule` | `"0 2 * * *"` | | ||
| | `evict_scd_ttl` | `"2688h"` | | ||
| | `evict_scd_operational_intents` | `true` | | ||
| | `evict_scd_subscriptions` | `true` | | ||
| | `evict_enable_rid_cron` | `true` | | ||
| | `evict_rid_schedule` | `"*/30 * * * *"` | | ||
| | `evict_rid_ttl` | `"30m"` | | ||
| | `evict_rid_isas` | `true` | | ||
| | `evict_rid_subscriptions` | `true` | | ||
|
|
||
| ## Examples | ||
|
|
||
| The examples below assume a DSS running locally via the `run_locally.sh` script. | ||
|
|
||
| ### List all entities older than 1 week | ||
|
|
||
| ```shell | ||
| docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-manager evict \ | ||
| --datastore_host=local-dss-crdb --scd_ttl=168h --rid_ttl=168h | ||
| ``` | ||
|
|
||
| ### List only operational intents older than 1 week | ||
|
|
||
| ```shell | ||
| docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-manager evict \ | ||
| --datastore_host=local-dss-crdb --scd_ttl=168h --scd_oir=true --scd_sub=false | ||
| ``` | ||
|
|
||
| ### Delete all entities older than 30 days | ||
|
|
||
| ```shell | ||
| docker compose -f docker-compose_dss.yaml -p dss_sandbox exec local-dss-core-service db-manager evict \ | ||
| --datastore_host=local-dss-crdb --scd_ttl=720h --rid_ttl=720h --delete | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mickmis could you double-check that my suggestion is accurate? I think the reason the defaults are the way they are is that each DSS instance is supposed to be responsible for cleaning up its own RID stuff (and deletion is limited to entities created by the DSS instance doing the deletion, as determined by locality), but SCD cleanup is a global activity and therefore DSS instance operators would want to at least coordinate to avoid cleaning up more than necessary (or perhaps designated a single USS as responsible for cleanup).