Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,9 @@ in several locations:
Porch's data storage operations are covered in significantly greater depth in the [Architecture and Components section]({{% relref "/docs/5_architecture_and_components/_index.md" %}}).

Each data store serves as the source of truth for different elements of Porch's data structure:
- custom resource objects on the Kubernetes control plane:
- Porch repositories (Repository objects)
- package variants (and by extension package variant sets)
- **if the CR cache is configured:** "work-in-progress" package revisions whose lifecycle stage is "Draft", "Proposed",
or "DeletionProposed"
- Git repositories:
- package revisions (Kpt package file contents and directory structures)
- Package revision cache:
- Kubernetes-related metadata for package revisions (e.g. labels and annotations)
- **if the DB cache is configured:** "work-in-progress" package revisions whose lifecycle stage is "Draft", "Proposed",
or "DeletionProposed"
- **custom resource objects on the Kubernetes control plane**, such as Porch repositories (Repository objects), package variants (and by extension package variant sets) and **if** the CR cache is configured, "work-in-progress" package revisions whose lifecycle stage is "Draft", "Proposed", or "DeletionProposed"
- **Git repositories**, such as package revisions (Kpt package file contents and directory structures)
- **Package revision cache**, such as Kubernetes-related metadata for package revisions (for example, labels and annotations) and **if** the DB cache is configured, "work-in-progress" package revisions whose lifecycle stage is "Draft", "Proposed", or "DeletionProposed"
Comment on lines +29 to +31

## Backup strategy

Expand Down Expand Up @@ -144,17 +136,13 @@ By default, before running the disaster scenarios, the suite creates, installs,
- a local Kind cluster (the "data cluster") containing:
- a Git server (Gitea)
- containing several copies of public repositories, each containing a large quantity of sample, test, and catalogued
Kpt packages
- to provide a representative workload for Porch when restoring and reconciling package revision data
- a PostgreSQL instance, set up to allow Porch to connect to and use it for the DB cache
- to make it possible to wipe and restore the database independently of Porch
- another local Kind cluster
- with Porch installed
- with DB cache connected to the PostgreSQL instance installed on the data cluster
- with the `porch-server` microservice's memory limits increased to `4GiB`
Kpt packages. This is needed to provide a representative workload for Porch when restoring and reconciling package revision data.
- a PostgreSQL instance, set up to allow Porch to connect to and use it for the DB cache. This is needed to make it possible to wipe and restore the database independently of Porch.
- another local Kind cluster with Porch installed
- with DB cache connected to the PostgreSQL instance installed on the data cluster
- with the `porch-server` microservice's memory limits increased to `4GiB`
- ~115 Repository objects created in Porch, each targeted to a different combination of Git repository and directory within
the repository
- this is calculated to maximize number of package revisions and provide representative workload
the repository. This is calculated to maximize number of package revisions and provide representative workload.
- a small number of new package revisions in various lifecycle states, set up using Porch's API to allow testing that they
also will be properly backed up and restored

Expand Down Expand Up @@ -196,12 +184,12 @@ data stores.

Kubernetes cluster is lost with all nodes; Git repositories are lost; DB cache database is lost.

#### Data backed up:
**Data backed up:**
Comment thread
lovesprung marked this conversation as resolved.
- Porch custom resources
- Git repository contents
- DB cache database contents

#### Data stores lost:
**Data stores lost:**
Comment thread
lovesprung marked this conversation as resolved.
- Kubernetes control plane: entire Kubernetes cluster deleted
- Git repositories: Git server deleted and recreated empty of data
- DB cache database:
Expand All @@ -215,29 +203,23 @@ Kubernetes cluster is lost with all nodes; Git repositories are lost; DB cache d

To ensure data compatibility, backup must be restored into the DB cache of the same version of Porch.

**In step 2, ensure Porch is reinstalled with the same version as before the cluster was lost!**
In step 2, ensure Porch is reinstalled with the same version as before the cluster was lost!
{{% /alert %}}
3. Restore backed-up repository contents to Git server
4. Restore backed-up database contents to PostgreSQL server
5. Perform GitOps reconciliation, gradually (in batches of 20) re-creating all backed-up Porch Repository objects
1. For each batch, wait until all Repository objects have condition with type "Ready" and status set "True"

#### Expected data loss

None - complete recovery of state at time data was backed up.

With backups of all data stores, Porch recovers all data.
**Expected data loss:** None. Complete recovery of state at time data was backed up. With backups of all data stores, Porch recovers all data.
Comment thread
lovesprung marked this conversation as resolved.


### 2. Kubernetes cluster loss

Kubernetes cluster is lost with all nodes; Git repositories and DB cache database remain safe.

#### Data backed up:
- Porch custom resources
**Data backed up:** Porch custom resources

#### Data stores lost:
- Kubernetes control plane: entire Kubernetes cluster deleted
**Data stores lost:** Kubernetes control plane. The entire Kubernetes cluster deleted.
Comment thread
lovesprung marked this conversation as resolved.

#### Restoration steps:
1. Recreate Kubernetes cluster
Expand All @@ -246,14 +228,12 @@ Kubernetes cluster is lost with all nodes; Git repositories and DB cache databas

To ensure data compatibility, backup must be restored into the DB cache of the same version of Porch.

**In step 2, ensure Porch is reinstalled with the same version as before the cluster was lost!**
In step 2, ensure Porch is reinstalled with the same version as before the cluster was lost!
{{% /alert %}}
3. Perform GitOps reconciliation, gradually (in batches of 20) re-creating all backed-up Porch Repository objects
1. For each batch, wait until all Repository objects have condition with type "Ready" and status set "True"

#### Expected data loss

None - complete recovery of state at time of cluster loss.
**Expected data loss:** None. Complete recovery of state at time of cluster loss.

Through using Git as the source of truth, we might expect Porch to automatically delete any state that only exists in the
cache - e.g., package revisions in "Draft" lifecycle stage. However, the connection between Porch and Git is represented by the Repository
Expand All @@ -266,27 +246,20 @@ cached state.
All Porch pods (by default, all in the "porch-system" namespace) are ungracefully restarted (e.g. by forcible pod deletion
with grace-period 0).

#### Data backed up:
**Data backed up:**
- Porch custom resources
- Git repository contents
- DB cache database contents

#### Data stores lost:
- None
- Porch will immediately begin to re-sync all repositories, resulting in a **decrease in quality of service** until all
repositories are deemed Ready
- **Porch API will be unavailable** to perform operations on package revisions
- `get` or `list` operations can be used to monitor Porch for API availability and repository status
**Data stores lost:** None. Porch will immediately begin to re-sync all repositories, resulting in a **decrease in quality of service** until all repositories are deemed Ready. **Porch API will be unavailable** to perform operations on package revisions You can use `get` or `list` operations to monitor Porch for API availability and repository status.

#### Restoration steps:
1. Wait until all Porch pods return to Ready state
2. Wait until all Repository objects have condition with type "Ready" and status set "True"
1. GitOps reconciliation is unnecessary in this case since the Repository objects are unchanged
3. List package revisions periodically, monitoring results until state stabilises

#### Expected data loss

None - no data stores were impacted, but only Porch's ability to manage them, allowing for full recovery.
**Expected data loss:** None. No data stores were impacted, but only Porch's ability to manage them, allowing for full recovery.

In a representative testing environment, recovery takes **less than 5 minutes** for 115 Repository objects with a `4GiB`
memory limit applied to the `porch-server` microservice
Expand Down Expand Up @@ -315,24 +288,16 @@ Kubernetes cluster and Git repositories remain safe; DB cache database is lost.
Applicable only to Porch with DB cache configured.
{{% /alert %}}

#### Data backed up:
- DB cache database contents
**Data backed up:** DB cache database contents.

#### Data stores lost:
- DB cache database:
- SQL script used to drop all Porch tables
- PostgreSQL server deleted and recreated empty of data
**Data stores lost:** DB cache database (SQL script used to drop all Porch tables, PostgreSQL server deleted and recreated empty of data).

#### Restoration steps:
1. Restore backed-up database contents to PostgreSQL server
2. Perform GitOps reconciliation, gradually (in batches of 20) re-creating all backed-up Porch Repository objects
1. For each batch, wait until all Repository objects have condition with type "Ready" and status set "True"

#### Expected data loss

None - complete recovery of state at time DB cache database was backed up.

With a backup of the cache database, Porch recovers all data.
**Expected data loss:** None. Complete recovery of state at time DB cache database was backed up. With a backup of the cache database, Porch recovers all data.

### 5. DB cache loss without backup

Expand All @@ -342,13 +307,9 @@ Kubernetes cluster and Git repositories remain safe; DB cache database is lost w
Applicable only to Porch with DB cache configured.
{{% /alert %}}

#### Data backed up:
- None
**Data backed up:** None.

#### Data stores lost:
- DB cache database:
- SQL script used to drop all Porch tables
- PostgreSQL server deleted and recreated empty of data
**Data stores lost:** DB cache database (SQL script used to drop all Porch tables, PostgreSQL server deleted and recreated empty of data).

#### Restoration steps:
1. Perform GitOps reconciliation, gradually (in batches of 20) re-creating all backed-up Porch Repository objects
Expand All @@ -362,7 +323,7 @@ Applicable only to Porch with DB cache configured.
3. Wait until all Repository objects have condition with type "Ready" and status set "True"
4. List package revisions periodically, monitoring results until state stabilises

#### Expected data loss
**Expected data loss:**

All "work in progress" on package revisions lost:
- package revisions in "Draft" or "Proposed" lifecycle stages - complete loss
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description: Common repository sync issues and their solutions

### Repository Not Syncing

**Problem**: Repository shows as Ready but packages aren't updating
**Problem**: Repository shows as Ready but packages are not updating

**Solutions**:
```bash
Expand All @@ -29,10 +29,7 @@ kubectl logs -n porch-system deployment/porch-controllers | grep "sync.*<repo-na
kubectl logs -n porch-system deployment/porch-controllers | grep "<repo-name>.*error"
```

**Common causes**:
- Invalid cron expression falls back to default frequency
- Repository authentication issues
- Network connectivity problems
**Common causes** invalid cron expression falls back to default frequency, repository authentication issues or network connectivity problems.

### Authentication Failures

Expand Down Expand Up @@ -103,14 +100,11 @@ kubectl logs -n porch-system deployment/porch-server | grep "repositorySync.*<re
git ls-remote <repo-url> # For Git repos
```

**Common causes**:
- Large repository taking time to clone/sync
- Network timeouts
- Repository structure issues
**Common causes**: large repository taking time to clone/sync, network timeouts or repository structure issues.

### One-time Sync Not Triggering

**Problem**: `porchctl repo sync` command succeeds but sync doesn't happen
**Problem**: `porchctl repo sync` command succeeds but sync does not happen

**Diagnostic steps**:
```bash
Expand All @@ -124,9 +118,7 @@ date -u # Compare with runOnceAt value
kubectl logs -n porch-system deployment/porch-controllers | grep "runOnceAt"
```

**Solutions**:
- Ensure timestamp is at least 1 minute in future
- Verify namespace is correct
**Solutions**: Ensure the timestamp is at least 1 minute in future and verify that the namespace is correct.

## Error Messages & Diagnostic Steps

Expand Down Expand Up @@ -213,10 +205,7 @@ See [Repository Controller Configuration]({{% relref "/docs/6_configuration_and_
**A**: Yes, periodic scheduling and one-time sync work independently. One-time synchronization executes regardless of the periodic schedule.

### Q: Why is my cron expression not working?
**A**: Porch uses standard 5-field cron format. Common mistakes:
- Using 6 fields (seconds not supported)
- Missing fields
- Invalid ranges or values
**A**: Porch uses standard 5-field cron format. Common mistakes include using 6 fields (seconds not supported), missing fields, or invalid ranges or values.

### Q: How do I stop repository syncing?
**A**: Repository synchronization cannot be completely stopped. Porch continuously monitors repositories for changes. You can only modify the sync frequency by updating the sync schedule configuration or remove custom schedules to use the default frequency.
Expand Down
Loading