Skip to content
23 changes: 22 additions & 1 deletion docs/remove_ger_runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,27 @@ WIP, TBD, code not ready

This section provides instructions on how to identify claims that have been made using an invalid GER. Once you've detected an invalid GER using the methods described in the "Detection" section, you can use these queries to find all associated claims.

#### Using the remove-ger scan command

If you do not yet know which GERs were used by invalid claims, you can use the `remove_ger scan-invalid-claims` command to scan claim logs directly from L2 RPC and validate each claim GER against L1:

```bash
./remove_ger scan-invalid-claims --cfg aggkit-config.toml --from-block <START_BLOCK>
```

The command:

- reads claim logs from the L2 bridge contract using the L2 RPC
- computes or extracts the GER used by each claim
- checks whether that GER exists in the L1 `globalExitRootMap`
- prints the GERs that were used by invalid claims, together with claim counts and tx hashes

Use this when:

- you know the approximate block range where the invalid claims happened, but not the GER
- bridge-service indexing is unavailable or you want to validate directly from chain RPC
- you want a fast first pass before running SQL queries or manual classification

#### Query claims by Global Exit Root (GER)

Claims are stored in the `bridgesync` database (typically at the path configured in `L2BridgeSyncStoragePath`). Each claim record includes the `global_exit_root` field, which indicates which GER was used when the claim was executed.
Expand Down Expand Up @@ -1009,4 +1030,4 @@ For Category A (under-collateralization), after recovery you may need to:

- [A, B.1, B.2 deeper explanation](https://github.com/agglayer/ADRs/issues/28)
- [E2E testing instructions on prod networks](https://hackmd.io/@rachit77/S1ms6PYM-x)
- [CI E2E tests](https://github.com/agglayer/e2e/blob/main/tests/aggkit/latest-n-injected-ger.bats#L825)
- [CI E2E tests](https://github.com/agglayer/e2e/blob/main/tests/aggkit/latest-n-injected-ger.bats#L825)
41 changes: 40 additions & 1 deletion tools/remove_ger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Diagnose and recover from invalid Global Exit Root (GER) injection on L2.

**When to use it:** Use after you have detected an invalid GERβ€”for example via aggsender or l2gersync error logs. For how to detect invalid GERs and manual recovery context, see the [Remove GER runbook](../../docs/remove_ger_runbook.md). This tool automates the procedures described there.

If you do not yet know which GER was used by the bad claims, use `scan-invalid-claims` first to discover invalid GERs directly from L2 claim logs.

## Building

From the repository root:
Expand All @@ -26,6 +28,7 @@ The tool uses the **same** config file(s) as the main aggkit binary: standard `a
| Field | Type | Description |
| ----- | ---- | ------------ |
| **BridgeServiceURL** | string | Bridge service REST API base URL (**required**). Used for querying claims and bridges. The tool runs a health check at startup and will fail if the service is unreachable. |
| **L2NetworkID** | uint32 | L2 network ID served by the bridge service (**required** for diagnose/recover). Used when querying claims and bridges for the target L2. Set this to the same network ID that the bridge service uses for your L2. |
| **SovereignAdminKey** | section | Signing key with sovereign admin privileges (activate/deactivate emergency state, remove GER, unset/set claims, force-emit claim events). Supports local keystore, AWS KMS, and GCP KMS. See sub-fields below. |

**SovereignAdminKey** sub-fields (depends on `Method`):
Expand All @@ -43,12 +46,13 @@ Append the following to your existing `aggkit-config.toml` (adjust paths and URL
```toml
[RemoveGER]
BridgeServiceURL = "http://localhost:8080"
L2NetworkID = 12
SovereignAdminKey = { Method = "local", Path = "/path/to/sovereign_admin_keystore.json", Password = "your-keystore-password" }
```

## Commands

The tool has two modes: the default **diagnose & recover** command and the **generate** subcommand for testing.
The tool has three modes: the default **diagnose & recover** command, the **scan-invalid-claims** subcommand for discovery, and the **generate** subcommand for testing.

### Diagnose & recover (default)

Expand All @@ -66,6 +70,11 @@ The tool has two modes: the default **diagnose & recover** command and the **gen

You can pass multiple config files; later files override earlier ones (e.g. `--cfg base.toml --cfg overrides.toml`).

The `[RemoveGER]` section for the default diagnose/recover command must include both:

- `RemoveGER.BridgeServiceURL`
- `RemoveGER.L2NetworkID`

#### CLI flags

| Flag | Short | Required | Description |
Expand All @@ -75,6 +84,36 @@ You can pass multiple config files; later files override earlier ones (e.g. `--c
| `--yes` | β€” | No | Skip interactive confirmation and run recovery immediately. |
| `--force` | β€” | No | Continue even if the GER exists on L1 (still diagnose and remove). |

### scan-invalid-claims

Scan L2 claim logs directly from the L2 RPC, starting at a given block, and check whether the GER used by each claim exists on L1. The command prints the list of GERs that were used in invalid claims.

```bash
./remove_ger scan-invalid-claims --cfg aggkit-config.toml --from-block 123456
```

The command scans claim logs from `BridgeL2Sync.BridgeAddr` on L2, validates each claim GER against `L2GERSync.GlobalExitRootL1Addr` on L1, and groups the invalid claims by GER.

#### scan-invalid-claims flags

| Flag | Required | Default | Description |
| ---- | -------- | ------- | ----------- |
| `--cfg` | Yes | β€” | Configuration file(s), same format as aggkit-config.toml. |
| `--from-block` | Yes | β€” | Starting L2 block number to scan (inclusive). |
| `--to-block` | No | latest | Ending L2 block number to scan (inclusive). |
| `--chunk-size` | No | `5000` | Maximum L2 block range per `eth_getLogs` query. |

#### Config requirements for scan-invalid-claims

The scan command reads a subset of the aggkit config:

- `L1NetworkConfig.RPC.URL` β€” L1 RPC endpoint.
- `Common.L2RPC.URL` β€” L2 RPC endpoint.
- `BridgeL2Sync.BridgeAddr` β€” L2 bridge contract address.
- `L2GERSync.GlobalExitRootL1Addr` β€” L1 GER manager contract address.

The `[RemoveGER]` section is **not** required for `scan-invalid-claims`.

### generate

Generate a deterministic invalid GER scenario and print ready-to-run `cast` commands for injecting a fake GER and a fake claim into L2. This is intended for **E2E testing** of the recovery tool.
Expand Down
23 changes: 23 additions & 0 deletions tools/remove_ger/cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ import (
"github.com/urfave/cli/v2"
)

const defaultScanChunkSize = uint64(5000)

func main() {
app := cli.NewApp()
app.Name = "remove-ger"
Expand Down Expand Up @@ -36,6 +38,27 @@ func main() {
}
app.Action = remove_ger.Run
app.Commands = []*cli.Command{
{
Name: "scan-invalid-claims",
Usage: "Scan L2 claims from a starting block and report GERs that are invalid on L1",
Flags: []cli.Flag{
&cli.Uint64Flag{
Name: "from-block",
Usage: "Starting L2 block number to scan (inclusive)",
Required: true,
},
&cli.Uint64Flag{
Name: "to-block",
Usage: "Ending L2 block number to scan (inclusive, defaults to latest L2 block)",
},
&cli.Uint64Flag{
Name: "chunk-size",
Usage: "Maximum L2 block range per eth_getLogs query",
Value: defaultScanChunkSize,
},
},
Action: remove_ger.RunScanInvalidClaims,
},
Comment on lines +41 to +61
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description suggest a documentation-only change, but this diff also introduces a new CLI subcommand (scan-invalid-claims) and adds substantial new scanning/recovery logic and tests. Please update the PR title and/or description to reflect the actual scope so reviewers can assess risk and behavior changes appropriately.

Copilot uses AI. Check for mistakes.
{
Name: "generate",
Usage: "Generate an invalid GER scenario with ready-to-run cast commands for testing",
Expand Down
93 changes: 72 additions & 21 deletions tools/remove_ger/diagnosis.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,18 @@ type DiagnosisResult struct {
Scenario Scenario
}

func (r *DiagnosisResult) hasClaims() bool {
return r != nil && len(r.Claims) > 0
}

func (r *DiagnosisResult) needsGERRemoval() bool {
return r != nil && r.GERExistsOnL2
}

func (r *DiagnosisResult) hasRecoveryActions() bool {
return r != nil && (r.needsGERRemoval() || r.hasClaims())
}

// ClaimDiagnosis holds the classification for a single claim.
type ClaimDiagnosis struct {
GlobalIndex *big.Int
Expand Down Expand Up @@ -90,15 +102,16 @@ func Diagnose(ctx context.Context, env *Env, gerHash common.Hash, force bool) (*
}
result.GERTimestampL2 = l2Timestamp
result.GERExistsOnL2 = l2Timestamp != nil && l2Timestamp.Sign() > 0
if !result.GERExistsOnL2 {
return result, nil
}

// Step 3 β€” Find claims using the GER (via bridge service)
claims, err := GetClaimsByGER(ctx, env.BridgeService, env.L2NetworkID, gerHash)
if err != nil {
return nil, fmt.Errorf("get claims by GER: %w", err)
}
claims, err = filterActiveClaims(ctx, env.L2Bridge, claims)
if err != nil {
return nil, fmt.Errorf("filter active claims by GER: %w", err)
}
if len(claims) == 0 {
return result, nil
}
Expand Down Expand Up @@ -164,6 +177,30 @@ func GetClaimsByGER(
return claims, nil
}

func filterActiveClaims(
ctx context.Context,
l2Bridge l2ClaimStateLookup,
claims []*claimsynctypes.Claim,
) ([]*claimsynctypes.Claim, error) {
if len(claims) == 0 {
return nil, nil
}

activeClaims := make([]*claimsynctypes.Claim, 0, len(claims))
for _, claim := range claims {
active, err := isClaimStillActive(ctx, l2Bridge, claim.GlobalIndex)
if err != nil {
return nil, fmt.Errorf("global index %s: %w", claim.GlobalIndex.String(), err)
}
if !active {
continue
}
activeClaims = append(activeClaims, claim)
}

return activeClaims, nil
}

// claimResponseToClaim converts a bridge service ClaimResponse to a claimsynctypes.Claim.
func claimResponseToClaim(r *bridgetypes.ClaimResponse) *claimsynctypes.Claim {
globalIndex, ok := new(big.Int).SetString(string(r.GlobalIndex), decimalBase)
Expand Down Expand Up @@ -464,6 +501,8 @@ func PrintDiagnosis(result *DiagnosisResult) {
ts = result.GERTimestampL2.String()
}
fmt.Printf(" L2: EXISTS (timestamp: %s)\n", ts)
} else if result.hasClaims() {
fmt.Println(" L2: NOT FOUND (already removed, but related claims still exist)")
} else {
fmt.Println(" L2: NOT FOUND (nothing to do)")
}
Expand Down Expand Up @@ -527,39 +566,51 @@ func scenarioDescription(s Scenario) string {

func printRecoveryPlanSteps(result *DiagnosisResult) {
fmt.Println("The following steps will be executed:")
step := 1
fmt.Printf(" %d. Freeze bridge (activateEmergencyState)\n", step)
step++
fmt.Printf(" %d. Remove GER %s (removeGlobalExitRoots)\n", step, result.InvalidGER.Hex())
step++
steps := buildRecoveryPlanSteps(result)
if len(steps) == 0 {
fmt.Println(" No on-chain action required.")
return
}
for i, step := range steps {
fmt.Printf(" %d. %s\n", i+1, step)
}
}

func buildRecoveryPlanSteps(result *DiagnosisResult) []string {
if !result.hasRecoveryActions() {
return nil
}

steps := []string{"Freeze bridge (activateEmergencyState)"}
if result.needsGERRemoval() {
steps = append(steps, fmt.Sprintf("Remove GER %s (removeGlobalExitRoots)", result.InvalidGER.Hex()))
}

switch result.Scenario {
case ScenarioNoClaims:
// no unset/set/emit
case ScenarioCategoryA:
for _, cd := range result.Claims {
fmt.Printf(" %d. Unset claim %s (unsetMultipleClaims)\n", step, formatGlobalIndex(cd.GlobalIndex))
step++
steps = append(steps, fmt.Sprintf("Unset claim %s (unsetMultipleClaims)", formatGlobalIndex(cd.GlobalIndex)))
}
case ScenarioCategoryB1:
for _, cd := range result.Claims {
fmt.Printf(" %d. Force emit corrected claim event for %s (forceEmitDetailedClaimEvent)\n",
step, formatGlobalIndex(cd.GlobalIndex))
step++
steps = append(steps, fmt.Sprintf(
"Force emit corrected claim event for %s (forceEmitDetailedClaimEvent)",
formatGlobalIndex(cd.GlobalIndex)))
}
case ScenarioCategoryB2:
for _, cd := range result.Claims {
fmt.Printf(" %d. Unset claim %s (unsetMultipleClaims)\n", step, formatGlobalIndex(cd.GlobalIndex))
step++
steps = append(steps, fmt.Sprintf("Unset claim %s (unsetMultipleClaims)", formatGlobalIndex(cd.GlobalIndex)))
}
fmt.Printf(" %d. Set claims with correct global indexes (setMultipleClaims)\n", step)
step++
steps = append(steps, "Set claims with correct global indexes (setMultipleClaims)")
for _, cd := range result.Claims {
fmt.Printf(" %d. Force emit corrected claim event for %s (forceEmitDetailedClaimEvent)\n",
step, formatGlobalIndex(cd.GlobalIndex))
step++
steps = append(steps, fmt.Sprintf(
"Force emit corrected claim event for %s (forceEmitDetailedClaimEvent)",
formatGlobalIndex(cd.GlobalIndex)))
}
}

fmt.Printf(" %d. Restore bridge (deactivateEmergencyState)\n", step)
steps = append(steps, "Restore bridge (deactivateEmergencyState)")
return steps
}
Loading
Loading