Skip to content

Conversation

@sergeytimoshin
Copy link
Contributor

@sergeytimoshin sergeytimoshin commented Jan 7, 2026

  • add api server for health checks and system status.
  • add static dashboard.html for a web-based status UI
  • add support for filtering processed trees by group authority

Summary by CodeRabbit

  • New Features

    • Built-in HTTP API (health, status, metrics) and interactive dashboard; richer per-tree and forester status views.
    • V2 queue introspection and expanded status outputs.
  • Performance

    • Batched proof fetches, improved retry/backoff and reduced per-tree batch work.
    • New indexer response-time and proof-count metrics.
  • Chores

    • Shortened env var names; added API server port and optional group-authority config.
    • Static dashboard assets packaged in container; removed legacy test orchestration script.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Adds a Warp-based HTTP API server and static dashboard, exposes /health, /status, and /metrics endpoints; implements comprehensive forester status assembly, V2 queue parsing and metrics, introduces group-authority filtering, CLI/config env-key changes, RPC trait extension, and multiple processor batching/sender updates.

Changes

Cohort / File(s) Summary
API Server & Dashboard
forester/src/api_server.rs, forester/static/dashboard.html, forester/src/main.rs, forester/Dockerfile
New Warp-based API server and embedded dashboard; spawn_api_server called at startup; dashboard copied into final Docker image; graceful shutdown support via ApiServerHandle.
Forester Status & Metrics
forester/src/forester_status.rs, forester/src/metrics.rs
New status types (ForesterStatus, TreeStatus, ForesterInfo) and async status assembly; balance/tree parsing; new indexer metrics (INDEXER_RESPONSE_TIME, INDEXER_PROOF_COUNT) and update helpers.
Queue Helpers & V2 Parsing
forester/src/queue_helpers.rs, forester/src/tree_data_sync.rs
Add V2 queue models (V2QueueInfo, BatchInfo, ParsedBatchData), parsing helpers, new public get_address_v2/get_state_v2 entry points; fetch_queue_item_data signature simplified; TreeAccounts now carry owner; new fetch_protocol_group_authority helper.
Indexer batching & proof fetching
forester/src/processor/v1/helpers.rs, forester/src/processor/v1/send_transaction.rs
Batched address-proof fetching with retries, timing metrics, and per-batch logging; send_transaction simplified queue index selection, fixed batch size constant, and switched to solana_commitment_config import.
Epoch manager & group-authority filtering
forester/src/epoch_manager.rs, forester/src/config.rs
Epoch monitor lifecycle hardened (select!, backoff, PagerDuty alerts); GeneralConfig gains group_authority: Option<Pubkey> and trees are filtered by owner when configured.
CLI / Config / Tests
forester/src/cli.rs, forester/src/config.rs, forester/tests/*
ENV bindings renamed (FORESTER_* → shorter keys), added StartArgs fields api_server_port, api_server_public_bind, group_authority; tests updated to initialize new fields.
Processor v2 / Tx sender tweaks
forester/src/processor/v2/processor.rs, forester/src/processor/v2/tx_sender.rs
Reduced MAX_BATCHES_PER_TREE (20→4); tx_sender adds periodic eligibility checks, early termination, and cache-save-on-exit behavior.
RPC trait & client
sdk-libs/client/src/rpc/rpc_trait.rs, sdk-libs/client/src/rpc/client.rs, sdk-libs/program-test/src/program_test/rpc.rs
Added get_program_accounts_with_discriminator to Rpc trait and implemented for LightClient and LightProgramTest.
Build / Manifest
forester/Cargo.toml, forester/src/lib.rs
Added workspace solana-commitment-config, enabled warp with server feature, moved dotenvy to dependencies, exported new api_server module.
Static / Tests cleanup
forester/test.sh, forester/static/dashboard.html, forester/tests/*
Removed test orchestration script; added large static dashboard HTML; tests adjusted for config surface changes.

Sequence Diagram(s)

sequenceDiagram
    participant B as Browser / Client
    participant API as Forester API Server
    participant S as Forester status logic
    participant RPC as LightClient / Solana RPC
    participant MET as Prometheus Registry

    B->>API: GET /status
    activate API
    API->>S: assemble status (rpc_url, filter options)
    activate S
    S->>RPC: fetch registry/epoch PDAs & trees
    RPC-->>S: PDAs, tree accounts (with owners)
    S->>RPC: batch get_account_info for merkle trees, queues, balances
    RPC-->>S: account data
    S->>MET: read Prometheus metrics (for metrics endpoint)
    MET-->>S: metric families
    S-->>API: ForesterStatus JSON
    deactivate S
    API-->>B: 200 OK {status, trees, foresters, metrics}
    deactivate API
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

ai-review

Suggested reviewers

  • ananas-block
  • SwenSchaeferjohann

Poem

🌲 A server wakes to watch the trees,
Dashboards hum and metrics breeze,
Owners filtered, queues aligned,
Status served for curious mind—
Forester sings with new-found ease.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.91% which is insufficient. The required threshold is 70.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: forester API server' is directly related to the primary changes in the PR, which introduce a new API server module with health checks, metrics endpoints, and a dashboard UI.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sergey/forester-status-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…oints; add dashboard UI

- Added `spawn_api_server` to serve HTTP endpoints for health checks, system status, and metrics.
- Added static `dashboard.html` for a web-based status UI.
@sergeytimoshin sergeytimoshin force-pushed the sergey/forester-status-api branch from f623171 to b328f49 Compare January 7, 2026 11:33
Copy link
Contributor

@SwenSchaeferjohann SwenSchaeferjohann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Fix all issues with AI agents
In @forester/src/api_server.rs:
- Around line 96-98: The code is ignoring the Result from encoder.encode(),
which can leave buffer empty and return an empty MetricsResponse silently;
change the call to check the Result of encoder.encode(&metric_families, &mut
buffer), and on Err log the error (including the error message) and return an
appropriate error response (e.g., an HTTP 500 MetricsResponse or propagate the
error) instead of continuing to create text from an empty buffer; update the
logic around buffer/text/MetricsResponse to handle the failure path cleanly.
- Around line 49-74: The status route currently uses tokio::task::spawn_blocking
to call get_forester_status_blocking, causing redundant runtime creation; change
the handler to call get_forester_status_async(&rpc_url) directly from the async
closure (remove spawn_blocking and get_forester_status_blocking usage),
propagate and handle its Result the same way you did before (Ok(status) -> json
status, Err(e) -> log and return ErrorResponse), and make
get_forester_status_async public so it can be called from this module.

In @forester/src/forester_status.rs:
- Line 94: The function get_forester_status_async is currently private but
should be made public so external callers (like the API server) can await it
directly and avoid creating nested runtimes; change its signature to pub async
fn get_forester_status_async(rpc_url: &str) -> crate::Result<ForesterStatus> and
update any module exports if necessary (e.g., pub(crate) vs pub) so
api_server.rs can call it directly and remove the redundant runtime::block_on or
runtime creation there.
- Around line 266-268: The loop over tree_statuses currently does let
queue_pubkey: Pubkey = status.queue.parse().unwrap_or_default(), which silently
substitutes Pubkey::default() on parse failures; change this to explicitly
handle parse errors (e.g., match or if let Err) so you log an error that
includes the invalid status.queue string and the parse error and then skip
processing that status (continue) instead of computing schedules with a zero
pubkey; update the block that creates schedule: Vec<Option<usize>> to run only
after a successful parse (using the parsed queue_pubkey variable).
- Line 586: The match arm handling TreeType::Unknown currently returns (0.0, 0,
0, None, None) silently; update that arm to emit a warning before returning (use
the project's logging facility, e.g., log::warn! or tracing::warn!) and include
identifying context such as the tree id/uuid or name and the tree_type
(TreeType::Unknown) so operators can see when an unexpected type is encountered,
then return the same tuple as before.
- Around line 338-363: The code currently ignores errors from forester_result,
epoch_result, and config_result which can hide RPC/deserialization failures;
update the handling so each result is matched (or check if let Err(e)) and when
Err occurs call the logger (e.g., process_logger.warn/error or similar)
including the error details and context (e.g., "failed fetching forester
accounts", "failed fetching epoch accounts", "failed fetching protocol config
accounts"), while keeping the existing success path that deserializes with
ForesterEpochPda::try_deserialize_unchecked,
EpochPda::try_deserialize_unchecked, and
ProtocolConfigPda::try_deserialize_unchecked and pushes PDAs into
forester_epoch_pdas, epoch_pdas, and protocol_config_pdas respectively.

In @forester/src/main.rs:
- Around line 113-114: The spawn_api_server call currently ignores the Result
from warp::serve(routes).run(addr).await so bind failures are silent; update
spawn_api_server to await the server run result and handle Err by logging the
address and error (e.g., check if let Err(e) =
warp::serve(routes).run(addr).await { error!(...) }), and consider propagating
failures to the caller instead of swallowing them (e.g., make spawn_api_server
return a JoinHandle<Result<()>> or send an error over a channel) so the main
pipeline can react if the API server fails to bind.

In @forester/src/queue_helpers.rs:
- Around line 430-479: get_state_v2_output_queue_info and
print_state_v2_output_queue_info duplicate the loop over
queue.batch_metadata.batches; extract that shared parsing into a helper (e.g.,
parse_batches or build_batch_info) that takes a reference to
BatchedQueueAccount::batch_metadata (or the batch slice) and returns the derived
values needed by both functions (Vec<BatchInfo>,
total_unprocessed/pending_batches, zkp_batch_size,
output_items_in_current_zkp_batch, etc.); replace the existing loop in
get_state_v2_output_queue_info with a call to that helper and update
print_state_v2_output_queue_info to call the same helper so both functions reuse
the parsing logic and return/build a V2QueueInfo or use the helper outputs to
construct it.
- Around line 197-205: The log in queue_helpers.rs that currently uses
tracing::info! to emit per-fetch stats (referencing queue_pubkey, total_items,
total_pending, start_index, end_index, filtered_queue.len()) should be demoted
to tracing::debug! to avoid flooding production logs; replace
tracing::info!(...) with tracing::debug!(...) keeping the same formatted message
and arguments so the detailed stats remain available at debug level.

In @forester/src/tree_data_sync.rs:
- Around line 319-325: The function get_registered_program_pda duplicates the
implementation found in the registry SDK
(account_compression_cpi::sdk::get_registered_program_pda); remove the local
implementation from tree_data_sync.rs and import/use the single implementation
from the registry SDK (e.g., add a use/import for
account_compression_cpi::sdk::get_registered_program_pda or call it via the
registry SDK module) so there is a single source of truth for
get_registered_program_pda.

In @forester/static/dashboard.html:
- Around line 745-748: The showError function injects untrusted message into
innerHTML causing XSS; change it to set the container's text content instead of
HTML. In the showError function (targeting the element with id
'errorContainer'), replace the innerHTML assignment with setting textContent (or
creating a text node) so the literal "Error: " prefix plus the message are
rendered as plain text rather than parsed as HTML. Ensure any styling wrapper
remains via existing CSS class on the container rather than inserting HTML
markup.
- Around line 770-785: Replace the hardcoded 0.46 magic number with a single
named constant (e.g., SLOT_SECONDS or SECONDS_PER_SLOT) defined near the top of
this file and reuse it in formatSlotsToTime and the other two places referenced
(lines ~1020 and ~1196) so all slot-to-time calculations are driven by one
authoritative value; update formatSlotsToTime to compute totalSeconds using that
constant (Math.floor(slots * SLOT_SECONDS)) and ensure the constant is
exported/accessible to any other functions in this file that perform similar
conversions.

Comment on lines 113 to 114
let rpc_url_for_api: String = config.external_services.rpc_url.to_string();
spawn_api_server(rpc_url_for_api, args.api_server_port);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find forester -name "api_server.rs" -type f

Repository: Lightprotocol/light-protocol

Length of output: 98


🏁 Script executed:

rg -n "pub fn spawn_api_server" forester/ -A 30

Repository: Lightprotocol/light-protocol

Length of output: 2401


🏁 Script executed:

# Check if there's a CLAUDE.md or documentation about the API server
find . -name "CLAUDE.md" -o -name "*.md" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 649


🏁 Script executed:

# Get more of the spawn_api_server function to see the warp::serve and error handling
rg -n "pub fn spawn_api_server" forester/src/api_server.rs -A 50 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 2457


🏁 Script executed:

# Check CLAUDE.md for guidance on API server behavior
cat CLAUDE.md | grep -i -A 5 -B 5 "api" | head -50

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Also check forester/README.md for any documentation about API server
cat forester/README.md | grep -i -A 5 -B 5 "api"

Repository: Lightprotocol/light-protocol

Length of output: 1298


🏁 Script executed:

# Get the complete spawn_api_server function - need to see the warp::serve call and error handling
rg -n "pub fn spawn_api_server" forester/src/api_server.rs -A 100 | tail -60

Repository: Lightprotocol/light-protocol

Length of output: 2724


🏁 Script executed:

# Check if there's any error handling or Result type after warp::serve
rg -n "warp::serve" forester/src/api_server.rs -A 10

Repository: Lightprotocol/light-protocol

Length of output: 377


🏁 Script executed:

# Check the actual behavior of warp::serve().run() on binding failure
# Look at how errors are handled (or not) in the complete function
cat -n forester/src/api_server.rs | sed -n '30,89p'

Repository: Lightprotocol/light-protocol

Length of output: 2941


🏁 Script executed:

# Check if there are any tests for spawn_api_server that show expected behavior
rg -n "spawn_api_server" forester/ --type rust -B 3 -A 3 | head -50

Repository: Lightprotocol/light-protocol

Length of output: 1255


🏁 Script executed:

# Check coding guidelines in CLAUDE.md about error handling for spawned threads
cat CLAUDE.md | head -100

Repository: Lightprotocol/light-protocol

Length of output: 5665


Unhandled error binding the API server port — fix or document this behavior.

The warp::serve(routes).run(addr).await; call returns a Result but the error is never handled, logged, or propagated. If port binding fails (port in use, insufficient permissions, etc.), the error is silently lost. The spawned thread exits, but the main pipeline remains unaware the API server never started.

Consider adding an error log inside the spawned thread:

if let Err(e) = warp::serve(routes).run(addr).await {
    error!("API server failed to bind to {}: {}", addr, e);
}

Alternatively, at minimum document that API server startup failures don't block the pipeline.

🤖 Prompt for AI Agents
In @forester/src/main.rs around lines 113 - 114, The spawn_api_server call
currently ignores the Result from warp::serve(routes).run(addr).await so bind
failures are silent; update spawn_api_server to await the server run result and
handle Err by logging the address and error (e.g., check if let Err(e) =
warp::serve(routes).run(addr).await { error!(...) }), and consider propagating
failures to the caller instead of swallowing them (e.g., make spawn_api_server
return a JoinHandle<Result<()>> or send an error over a channel) so the main
pipeline can react if the API server fails to bind.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

🤖 Fix all issues with AI agents
In @forester/src/api_server.rs:
- Around line 106-133: The current line parsing uses rsplit_once(' ') which is
brittle; instead trim each line, skip comments/empty lines, get the last
whitespace-separated token (e.g., using split_whitespace().last()) as the
value_str and parse it with .parse().ok() (skip the line on parse failure) and
keep metric_part = the rest joined or obtained via splitting off the last token
so extract_label still receives a trimmed metric string; update the block around
text.lines(), replace rsplit_once usage, remove unwrap_or(0.0) in favor of
handling parse errors, and ensure extract_label, transactions_processed,
transaction_rate, last_run_timestamp, forester_balances, and queue_lengths
behave with the trimmed/robust metric_part.
- Around line 58-71: The error branches currently return
Ok(warp::reply::json(&error_response)) which yields HTTP 200; change both
branches (the ones matching Ok(Err(e)) and Err(e)) to return
Ok(warp::reply::with_status(warp::reply::json(&error_response),
StatusCode::INTERNAL_SERVER_ERROR)) (or another appropriate StatusCode) so the
response body and HTTP status both indicate failure, and add/import StatusCode
from warp::http if not already available.
- Around line 91-143: In get_metrics_json, don't ignore the Result from
encoder.encode(&metric_families, &mut buffer); instead handle the Result: if
encoding returns Err, log the error (e.g., via log::error or tracing::error)
with context like "failed to encode Prometheus metrics" and either propagate an
error response or set a clear failure indicator in the returned MetricsResponse
(so callers/HTTP handler can surface it); update the call site around
encoder.encode in get_metrics_json to match this error handling and ensure
MetricsResponse construction reflects the failure path.

In @forester/src/cli.rs:
- Around line 20-87: StartArgs now binds fields like rpc_url, push_gateway_url,
payer, derivation, etc. to generic env names (RPC_URL, PUSH_GATEWAY_URL, PAYER,
DERIVATION_PUBKEY), which breaks existing FORESTER_* env-based deployments;
update the README, Dockerfile, Kubernetes manifests, and CI/deployment scripts
to use the new env names OR add backward-compatible support in StartArgs
(cli.rs) to accept the old FORESTER_RPC_URL, FORESTER_PUSH_GATEWAY_URL,
FORESTER_PAYER, FORESTER_DERIVATION_PUBKEY (and other FORESTER_* variants) as
fallbacks for rpc_url, push_gateway_url, payer, derivation, prover_* and
photon_* fields so existing deployments continue working.

In @forester/src/epoch_manager.rs:
- Around line 2818-2848: Inside the Ok(mut fetched_trees) branch (the match
fetch_result block) where you compute group_authority (from
config.general_config.group_authority or fetch_protocol_group_authority) and
then filter fetched_trees by owner, add an explicit warn log when there were
trees before filtering (before_count > 0) but none remain after filtering
(fetched_trees.len() == 0); reference the variables fetched_trees, before_count
and group_authority in the message so it’s clear which authority produced the
empty result and aids debugging of misconfigured group authorities or mismatched
tree owners.

In @forester/src/forester_status.rs:
- Around line 323-369: fetch_registry_accounts_filtered currently ignores errors
from the three tokio::join! results (forester_result, epoch_result,
config_result) and silently proceeds; update the Err branches to log a warning
with context and the error (e.g., "fetch_registry_accounts_filtered: forester
RPC error: {}", error) for each of the three cases before continuing so
operators can see RPC/deserialization failures while still allowing the function
to return empty vectors if desired; reference the function name
fetch_registry_accounts_filtered and the result variables forester_result,
epoch_result, config_result when implementing the warnings.
- Around line 462-587: In parse_tree_status, the TreeType::Unknown arm silently
returns zeros; add a tracing::warn! before returning to surface mis-typed trees
(include tree.pubkey and tree.tree_type in the message) so operators see a
warning when TreeType::Unknown is encountered; update the TreeType::Unknown
branch in parse_tree_status to emit the warn and then return the same (0.0, 0,
0, None, None).
- Around line 265-287: The loop currently uses
status.queue.parse().unwrap_or_default(), which silently turns parse failures
into Pubkey::default() and corrupts schedules; replace that with explicit error
handling: parse the pubkey with status.queue.parse::<Pubkey>() and handle Err by
either returning an error from this function or by skipping this tree (e.g., log
the bad status.queue value and continue the for status in &mut tree_statuses
loop), then use the successfully parsed queue_pubkey in the existing call to
ForesterEpochPda::get_eligible_forester_index; ensure any logging includes the
offending status.queue and sufficient context (references: status.queue,
ForesterEpochPda::get_eligible_forester_index, active_epoch_forester_pdas,
authority_to_index, tree_statuses).
- Around line 88-93: Make get_forester_status_async public so async callers can
use it directly and avoid spawning nested runtimes; change the visibility of the
existing get_forester_status_async function to pub, leave
get_forester_status_blocking as a thin legacy wrapper that creates a tokio
Runtime and calls get_forester_status_async, and consider annotating the
blocking wrapper as deprecated in comments to guide users toward the async API.
Ensure any module exports or uses reference the newly public
get_forester_status_async symbol so async contexts call it directly while
synchronous callers continue to use get_forester_status_blocking.

In @forester/src/main.rs:
- Around line 113-115: spawn_api_server currently "fires and forgets", so
failures (e.g., bind errors) are lost while run_pipeline proceeds; update
spawn_api_server to return a JoinHandle<Result<(), anyhow::Error>> (or similar)
and capture that handle where it's called (the call site using rpc_url_for_api
and args.api_server_port) so you can await or spawn a monitor task that logs the
Result; alternatively, ensure the thread created inside spawn_api_server logs
any startup errors (including bind failures) to the process logger before
exiting and document in comments that API server failures are non-fatal if you
choose not to propagate the error.

In @forester/src/queue_helpers.rs:
- Around line 82-86: The code that computes output_items_in_current_zkp_batch is
comparing b.state to the string "Fill", which is fragile; change the predicate
to compare against the enum variant (e.g., BatchState::Fill) or use
matches!(b.state, BatchState::Fill) so it uses the actual BatchState type like
the input-queue code does; update the find closure on batch_infos (the one
producing output_items_in_current_zkp_batch and referencing
items_in_current_zkp_batch) to use that enum comparison.
- Around line 197-205: The log currently uses tracing::info! inside the queue
fetch path and emits per-fetch stats (queue_pubkey, total_items, total_pending,
start_index, end_index, filtered_queue.len()), which can flood logs; change the
call from tracing::info! to tracing::debug! (or tracing::trace! if you prefer
even quieter output) so these detailed stats are emitted only at debug-level,
and optionally wrap it with a tracing::debug_enabled() check if computing the
message is expensive.
- Around line 415-480: The batch-iteration logic is duplicated across
get_state_v2_output_queue_info, print_state_v2_output_queue_info, and
parse_state_v2_queue_info; extract that loop into a shared helper (e.g.,
parse_v2_output_batches) that accepts a reference to
BatchedQueueAccount::batch_metadata (or the BatchedQueueAccount) and returns the
computed Vec<BatchInfo>, total_unprocessed, zkp_batch_size, and
output_items_in_current_zkp_batch; inside the helper reuse the same calls used
now (DEFAULT_ZKP_BATCH_SIZE, batch.get_num_inserted_zkps(),
batch.get_current_zkp_batch_index(), batch.get_num_inserted_zkp_batch(),
batch.get_state()) to compute batch_infos and totals, then replace the
duplicated loops in get_state_v2_output_queue_info,
print_state_v2_output_queue_info, and parse_state_v2_queue_info to call this
helper and construct their respective V2QueueInfo or print output.

In @forester/static/dashboard.html:
- Around line 1059-1067: The template interpolation renders f.authority directly
into the HTML (inside the active_epoch_foresters map that builds each
.forester-item), which risks unescaped content; update the rendering to escape
f.authority before insertion (e.g., add/ reuse an escapeHtml utility and call it
when injecting f.authority) or switch to constructing the span via DOM and
setting its textContent for the authority value; ensure the change is applied
where active_epoch_foresters is mapped (the block using getForesterColor,
getBalanceClass, and formatBalance) so only the escaped authority string is
output.
- Around line 745-748: The showError function writes unsanitized message into
innerHTML causing XSS; update it to avoid innerHTML by selecting the
'errorContainer' element and assigning its textContent (or create and append a
text node) with a safe string like "Error: " + message, ensuring you check the
element exists before setting to prevent null refs; do not use innerHTML or
string interpolation for user-controlled error.message.
- Around line 770-785: The code uses the literal slot duration (0.46 / 460) in
multiple places which is fragile; extract a single named constant (e.g.,
SLOT_DURATION_SECONDS = 0.46 and SLOT_DURATION_MS = 460) and replace all literal
uses: inside formatSlotsToTime use Math.floor(slots * SLOT_DURATION_SECONDS) to
compute total seconds, replace any occurrences like slots_until_next_light_slot
* 0.46 with slots_until_next_light_slot * SLOT_DURATION_SECONDS, and replace
slotsRemaining * 460 with slotsRemaining * SLOT_DURATION_MS (preserve any
existing rounding/Math.max logic such as adding the 500ms buffer). Ensure the
new constants are defined near the top of the script so all functions
(formatSlotsToTime and the code handling slots_until_next_light_slot and
slotsRemaining) reference the same values.

Comment on lines +20 to 87
#[arg(long, env = "RPC_URL")]
pub rpc_url: Option<String>,

#[arg(long, env = "FORESTER_PUSH_GATEWAY_URL")]
#[arg(long, env = "PUSH_GATEWAY_URL")]
pub push_gateway_url: Option<String>,

#[arg(long, env = "FORESTER_PAGERDUTY_ROUTING_KEY")]
#[arg(long, env = "PAGERDUTY_ROUTING_KEY")]
pub pagerduty_routing_key: Option<String>,

#[arg(long, env = "FORESTER_WS_RPC_URL")]
#[arg(long, env = "WS_RPC_URL")]
pub ws_rpc_url: Option<String>,

#[arg(long, env = "FORESTER_INDEXER_URL")]
#[arg(long, env = "INDEXER_URL")]
pub indexer_url: Option<String>,

#[arg(long, env = "FORESTER_PROVER_URL")]
#[arg(long, env = "PROVER_URL")]
pub prover_url: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_APPEND_URL",
env = "PROVER_APPEND_URL",
help = "Prover URL for append operations. If not specified, uses prover_url"
)]
pub prover_append_url: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_UPDATE_URL",
env = "PROVER_UPDATE_URL",
help = "Prover URL for update operations. If not specified, uses prover_url"
)]
pub prover_update_url: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_ADDRESS_APPEND_URL",
env = "PROVER_ADDRESS_APPEND_URL",
help = "Prover URL for address-append operations. If not specified, uses prover_url"
)]
pub prover_address_append_url: Option<String>,

#[arg(long, env = "FORESTER_PROVER_API_KEY")]
#[arg(long, env = "PROVER_API_KEY")]
pub prover_api_key: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_POLLING_INTERVAL_MS",
env = "PROVER_POLLING_INTERVAL_MS",
help = "Prover polling interval in milliseconds (default: 1000)"
)]
pub prover_polling_interval_ms: Option<u64>,

#[arg(
long,
env = "FORESTER_PROVER_MAX_WAIT_TIME_SECS",
env = "PROVER_MAX_WAIT_TIME_SECS",
help = "Maximum time to wait for prover response in seconds (default: 600)"
)]
pub prover_max_wait_time_secs: Option<u64>,

#[arg(long, env = "FORESTER_PAYER")]
#[arg(long, env = "PAYER")]
pub payer: Option<String>,

#[arg(long, env = "FORESTER_DERIVATION_PUBKEY")]
#[arg(long, env = "DERIVATION_PUBKEY")]
pub derivation: Option<String>,

#[arg(long, env = "FORESTER_PHOTON_API_KEY")]
#[arg(long, env = "PHOTON_API_KEY")]
pub photon_api_key: Option<String>,

#[arg(long, env = "FORESTER_PHOTON_GRPC_URL")]
#[arg(long, env = "PHOTON_GRPC_URL")]
pub photon_grpc_url: Option<String>,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Environment name changes are breaking — ensure docs and deployment configs are updated

StartArgs now binds to generic env vars like RPC_URL, PUSH_GATEWAY_URL, PAYER, DERIVATION_PUBKEY, etc., instead of the older FORESTER_… names. This is a behavioral change for operators who rely on environment configuration.

Please double‑check the README/deployment manifests (Docker, k8s, CI) and update them to reflect the new names so existing setups don’t silently start with missing config.

🤖 Prompt for AI Agents
In @forester/src/cli.rs around lines 20 - 87, StartArgs now binds fields like
rpc_url, push_gateway_url, payer, derivation, etc. to generic env names
(RPC_URL, PUSH_GATEWAY_URL, PAYER, DERIVATION_PUBKEY), which breaks existing
FORESTER_* env-based deployments; update the README, Dockerfile, Kubernetes
manifests, and CI/deployment scripts to use the new env names OR add
backward-compatible support in StartArgs (cli.rs) to accept the old
FORESTER_RPC_URL, FORESTER_PUSH_GATEWAY_URL, FORESTER_PAYER,
FORESTER_DERIVATION_PUBKEY (and other FORESTER_* variants) as fallbacks for
rpc_url, push_gateway_url, payer, derivation, prover_* and photon_* fields so
existing deployments continue working.

Comment on lines 2818 to +2848
match fetch_result {
Ok(mut fetched_trees) => {
let group_authority = match config.general_config.group_authority {
Some(ga) => Some(ga),
None => {
match fetch_protocol_group_authority(&*rpc).await {
Ok(ga) => {
info!("Using protocol default group authority: {}", ga);
Some(ga)
}
Err(e) => {
warn!(
"Failed to fetch protocol group authority, processing all trees: {:?}",
e
);
None
}
}
}
};

if let Some(group_authority) = group_authority {
let before_count = fetched_trees.len();
fetched_trees.retain(|tree| tree.owner == group_authority);
info!(
"Filtered trees by group authority {}: {} -> {} trees",
group_authority,
before_count,
fetched_trees.len()
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Group‑authority filtering behavior is sound; consider explicit logging when nothing matches

The in‑process filter by group_authority (config override or protocol default) on top of fetch_trees is a good safety net, especially when the RPC‑level filtered fetch falls back to unfiltered. One small improvement would be to log a warning if before_count > 0 but the filtered set becomes empty, to make misconfigured group authorities or mismatched tree owners easier to diagnose.

🤖 Prompt for AI Agents
In @forester/src/epoch_manager.rs around lines 2818 - 2848, Inside the Ok(mut
fetched_trees) branch (the match fetch_result block) where you compute
group_authority (from config.general_config.group_authority or
fetch_protocol_group_authority) and then filter fetched_trees by owner, add an
explicit warn log when there were trees before filtering (before_count > 0) but
none remain after filtering (fetched_trees.len() == 0); reference the variables
fetched_trees, before_count and group_authority in the message so it’s clear
which authority produced the empty result and aids debugging of misconfigured
group authorities or mismatched tree owners.

Comment on lines +1059 to +1067
${data.active_epoch_foresters.map((f, idx) => `
<div class="forester-item">
<span class="forester-address">
<span class="forester-color-dot" style="background: ${getForesterColor(idx)};"></span>
${f.authority}
</span>
<span class="forester-balance ${getBalanceClass(f.balance_sol)}">${formatBalance(f.balance_sol)}</span>
</div>
`).join('')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Forester authority rendered without escaping.

While f.authority is expected to be a pubkey (safe characters), it's interpolated directly into the HTML template. For defense in depth, consider escaping or using textContent:

// Current (relies on pubkey format being safe)
${f.authority}

// Safer pattern
${escapeHtml(f.authority)}

This is low risk since pubkeys contain only alphanumeric characters.

🤖 Prompt for AI Agents
In @forester/static/dashboard.html around lines 1059 - 1067, The template
interpolation renders f.authority directly into the HTML (inside the
active_epoch_foresters map that builds each .forester-item), which risks
unescaped content; update the rendering to escape f.authority before insertion
(e.g., add/ reuse an escapeHtml utility and call it when injecting f.authority)
or switch to constructing the span via DOM and setting its textContent for the
authority value; ensure the change is applied where active_epoch_foresters is
mapped (the block using getForesterColor, getBalanceClass, and formatBalance) so
only the escaped authority string is output.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
forester/src/processor/v2/tx_sender.rs (2)

269-339: Unused counter _address_append_count is tracked but never logged.

You're incrementing _address_append_count on line 309, but it's not included in the instr_type formatting (lines 331-339) like append_count and nullify_count. Either include it in the log message for completeness or remove the tracking entirely.

♻️ Suggested fix to include address append count in logging
                 let instr_type = if append_count > 0 && nullify_count > 0 {
                     format!("Append+Nullify({}+{})", append_count, nullify_count)
                 } else if append_count > 0 {
                     "Append".to_string()
                 } else if nullify_count > 0 {
                     "Nullify".to_string()
                 } else {
-                    "AddressAppend".to_string()
+                    format!("AddressAppend({})", _address_append_count)
                 };

Or alternatively, remove the unused counter:

-                let mut _address_append_count = 0usize;
...
-                            _address_append_count += 1;

501-502: Minor: _proof_ms is captured but unused.

The proof duration is already accumulated into self.proof_timings at lines 446-455, so capturing it again here serves no purpose. Consider removing this tracking or using it if there's a future plan for per-batch proof timing metrics.

♻️ Suggested cleanup
                     let round_trip = std::mem::replace(&mut self.pending_batch_round_trip_ms, 0);
-                    let _proof_ms = std::mem::replace(&mut self.pending_batch_proof_ms, 0);
+                    self.pending_batch_proof_ms = 0;
                     let earliest = self.pending_batch_earliest_submit.take();
forester/src/processor/v1/helpers.rs (3)

226-252: Potential silent data loss if proof count mismatches.

The zip at line 226 (and line 255 for state items) will silently truncate if address_items.len() != address_proofs.len(). While you log a warning when counts mismatch (lines 129-134), the code proceeds to use zip which silently drops extra items from the longer iterator.

If a mismatch represents a bug or data integrity issue, consider either:

  1. Returning an error instead of proceeding
  2. Using zip_eq from itertools which panics on length mismatch
  3. Explicitly asserting lengths match before zipping
🔧 Proposed fix: Fail on mismatch instead of silent truncation
+    if address_items.len() != address_proofs.len() {
+        return Err(anyhow::anyhow!(
+            "Address proof count mismatch: requested={}, received={}",
+            address_items.len(),
+            address_proofs.len()
+        ));
+    }
+
     for (item, proof) in address_items.iter().zip(address_proofs.into_iter()) {

Apply similarly before the state items loop.


373-375: Likely incorrect hostname check — "mainnet" is not a valid host.

The condition url.host_str() != Some("mainnet") will almost always be true because "mainnet" isn't a valid hostname (it would typically be something like "mainnet.helius-rpc.com" or similar). This means the function returns the hardcoded 10_000 for nearly all URLs, including actual mainnet endpoints.

Did you intend to check if the URL contains "mainnet" in the hostname? Or check against a specific RPC provider's domain?

🐛 Possible fix: Check if hostname contains "mainnet"
-    if url.host_str() != Some("mainnet") {
+    if !url.host_str().map_or(false, |h| h.contains("mainnet")) {
         return Ok(10_000);
     }

Or if you want to specifically target Helius:

-    if url.host_str() != Some("mainnet") {
+    let is_helius_mainnet = url.host_str()
+        .map_or(false, |h| h.contains("mainnet") && h.contains("helius"));
+    if !is_helius_mainnet {
         return Ok(10_000);
     }

402-408: Consider reusing reqwest::Client instead of creating one per call.

reqwest::Client::new() creates a new connection pool. The Client is designed to be reused across requests — it handles connection pooling, keep-alives, and other optimizations internally. Creating a fresh client per call negates these benefits.

Consider either:

  1. Accepting a &reqwest::Client parameter
  2. Using a lazy_static or once_cell singleton
  3. Caching the client at a higher level
♻️ Proposed refactor: Accept client as parameter
 pub async fn request_priority_fee_estimate(
+    client: &reqwest::Client,
     url: &Url,
     account_keys: Vec<Pubkey>,
 ) -> crate::Result<u64> {
     // ...
-    let client = reqwest::Client::new();
     let response = client
         .post(url.clone())

Callers would then manage a single shared Client instance.

forester/src/processor/v1/send_transaction.rs (1)

287-300: Return Timeout (not Cancelled) when the global deadline is hit.

The early check returns Cancelled even when Instant::now() >= timeout_deadline, which makes post-analysis harder and can hide systemic latency issues. Split the branch and return TransactionSendResult::Timeout on deadline.

forester/src/epoch_manager.rs (3)

770-875: Critical: processing_epochs flag can get stuck true forever (reset guard is created too late).

Any early return Ok(()) / return Err(...) before Line 869 will skip the guard and leave processing_flag=true, permanently preventing that epoch from being processed again.

Move the reset guard to immediately after the successful compare_exchange, so all exits reset the flag.

Proposed fix
         if processing_flag
             .compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst)
             .is_err()
         {
             debug!("Epoch {} is already being processed, skipping", epoch);
             return Ok(());
         }
-        
+        // Always reset the processing flag on every exit path.
+        let _reset_guard = scopeguard::guard((), |_| {
+            processing_flag.store(false, Ordering::SeqCst);
+        });

         let phases = get_epoch_phases(&self.protocol_config, epoch);

         // Attempt to recover registration info
         debug!("Recovering registration info for epoch {}", epoch);
         let mut registration_info = match self.recover_registration_info(epoch).await {
             Ok(info) => info,
             Err(e) => {
                 // ...
                     Err(ForesterError::Registration(
                         RegistrationError::RegistrationPhaseEnded { .. },
                     )) => {
                         // ...
                         return Ok(());
                     }
                 // ...
             }
         };
         // ...
-        // Ensure we reset the processing flag when we're done
-        let _reset_guard = scopeguard::guard((), |_| {
-            processing_flag.store(false, Ordering::SeqCst);
-        });
-
         info!("Exiting process_epoch");
         Ok(())

285-370: Critical: monitor_handle is never aborted; shutdown detaches it and leaks a live epoch monitor task.

Dropping a Tokio JoinHandle does not cancel the task; it detaches it. Your scopeguard aborts 3 tasks but not monitor_handle, so on shutdown (or if run() is dropped) the monitor continues running in the background.

Consider:

  • passing a shutdown receiver into EpochManager::run and having monitor_epochs select! on it, and
  • aborting monitor_handle in the same shutdown/guard path as the other tasks.

1585-1593: Fix V2 eligibility seed—must use queue pubkey per protocol rules, not merkle_tree.

The call at line 1587 passes &tree_accounts.merkle_tree to check_forester_eligibility, but the protocol requires the queue pubkey. This breaks the eligibility calculation: get_eligible_forester_index hashes the seed with epoch and current slot to derive a deterministic forester index. Using merkle_tree instead of queue produces incorrect eligibility and mismatches the /status endpoint schedule (which correctly uses queue). All occurrences in forester_status.rs use &queue_pubkey; V1 processing uses &tree_accounts.queue. The method parameter is even named queue_pubkey, making this semantic mismatch obvious.

Change &tree_accounts.merkle_tree to &tree_accounts.queue and verify all eligibility calculations across the codebase use the queue pubkey consistently.

🤖 Fix all issues with AI agents
In @forester/src/api_server.rs:
- Around line 30-78: spawn_api_server currently spawns a detached thread and can
hang forever; change it to return a JoinHandle (or accept a shutdown receiver)
and hook warp::serve(...).run(addr) into a graceful shutdown future (e.g.
with_graceful_shutdown) so the server can be stopped; additionally wrap calls to
get_forester_status in tokio::time::timeout inside status_route (use
tokio::time::timeout(Duration::from_secs(...),
get_forester_status(&rpc_url)).await and handle the Elapsed error by returning a
timeout ErrorResponse) and ensure the thread runtime is shut down when the
shutdown signal or returned JoinHandle is used.
- Around line 80-143: The current get_metrics_json implementation manually
parses the Prometheus text output (using encoder.encode + rsplit_once and
extract_label), which is fragile; change it to iterate REGISTRY.gather() and
process each prometheus::proto::MetricFamily and its Metric messages directly to
extract names, label pairs, and values (handle Counter, Gauge, Untyped,
Histogram/Summary appropriately), building the MetricsResponse from those
structured fields, properly handling NaN/Inf and exemplar data and returning
encoder errors instead of ignoring them, and remove or repurpose extract_label
since labels should come from metric.get_label() rather than string slicing.
- Around line 34-36: The code currently binds the HTTP API server to 0.0.0.0
using SocketAddr::from(([0, 0, 0, 0], port)) and logs it with info!("Starting
HTTP API server on {}", addr); change the default to bind to localhost
(127.0.0.1) or require/inspect an explicit CLI/config flag before binding to
0.0.0.0; specifically update the creation of addr (replace [0,0,0,0] with
[127,0,0,1]) and/or add a boolean flag (e.g., allow_public_bind) that, when
false, forces localhost and when true allows the current behavior and logs a
clear warning before starting the server, ensuring sensitive endpoints like
/status and /metrics/json are not exposed by default.
- Around line 37-41: The current dashboard_route closure constructs a Response
via Response::builder().body(DASHBOARD_HTML) which yields a Result and not a
Warp Reply; replace the closure to return a proper warp reply—prefer using
warp::reply::html(DASHBOARD_HTML) (or
warp::reply::with_header(warp::reply::html(...), "content-type", "text/html;
charset=utf-8") if you need to set headers explicitly) instead of the
Response::builder path so dashboard_route produces a valid impl Reply.
- Around line 49-63: The /status route currently returns a 200 OK with an error
JSON on get_forester_status failures; change the error branch in status_route to
return a non-200 HTTP status (e.g. StatusCode::INTERNAL_SERVER_ERROR or another
appropriate code) by wrapping the JSON reply with warp::reply::with_status
instead of plain warp::reply::json, and ensure StatusCode is imported
(warp::http::StatusCode). Keep the Ok::<_, warp::Rejection>(...) return but
replace the error path to
Ok(warp::reply::with_status(warp::reply::json(&error_response),
StatusCode::INTERNAL_SERVER_ERROR)) so clients and monitors see the proper HTTP
failure code while preserving the ErrorResponse body.

In @forester/src/epoch_manager.rs:
- Around line 367-371: The file has formatting issues—run rustfmt (e.g., cargo
fmt) on epoch_manager.rs to fix whitespace/formatting around the block that ends
with Err(anyhow!("Epoch monitor channel closed - forester cannot function
without it")) and the other misformatted region around the code at the 768-780
range; reformat the file, commit the changes, and ensure CI passes.

In @forester/src/forester_status.rs:
- Around line 317-363: fetch_registry_accounts_filtered is swallowing RPC errors
by using tokio::join! and then ignoring Err cases (forester_result,
epoch_result, config_result), causing empty vectors on failure; change to
propagate RPC errors by using tokio::try_join! (or otherwise return Err when any
rpc.get_program_accounts_with_discriminator call fails) so the function returns
an error via its crate::Result instead of silently returning empty data—update
the handling around forester_result/epoch_result/config_result (or remove the if
let Ok branches) to use the propagated results and let the ? operator bubble
errors up to callers.
- Around line 201-213: The code currently applies protocol group-authority
filtering implicitly whenever fetch_protocol_group_authority() succeeds; make
this behavior explicit by adding a configurable toggle/parameter and guarding
the logic with it (e.g., add a bool flag like filter_by_group to the status
handler or read from config/Options); only call fetch_protocol_group_authority()
and run trees.retain(...) when that flag is true, and update the debug/warn logs
to explicitly state whether filtering was enabled or disabled; refer to the
existing symbols fetch_protocol_group_authority, trees.retain, and the
surrounding status handler in forester_status.rs to implement the conditional
gating and logging.
- Around line 259-292: The schedule generation is seeding eligibility with the
queue pubkey (status.queue) but V2 uses the tree merkle key; change the seed to
parse and use the tree merkle pubkey (e.g., parse status.merkle_tree or the
equivalent merkle tree field on TreeStatus) instead of status.queue when calling
ForesterEpochPda::get_eligible_forester_index and building
schedule/assigned_forester so V2 trees use the correct seed; update the local
variable name (queue_pubkey -> tree_pubkey) and any related uses (the call site
that computes forester_idx and the assigned_forester lookup) accordingly.

In @forester/src/metrics.rs:
- Around line 211-228: Call register_metrics() at the start of both
update_indexer_response_time and update_indexer_proof_count to guarantee metrics
are registered before any .with_label_values/.set calls; this is cheap if
register_metrics uses a std::sync::Once or similar, so simply invoking
register_metrics() as the first statement in those functions ensures safe,
idempotent registration and prevents silent no-ops or panics when metrics are
updated before explicit registration.

In @forester/src/processor/v1/helpers.rs:
- Around line 107-152: The code currently clones each chunk with batch.to_vec()
before calling rpc.indexer()?.get_multiple_new_address_proofs which may be
avoidable—if get_multiple_new_address_proofs can accept a slice change the call
site to pass batch (a &[ [u8;32] ]) instead of allocating via batch.to_vec();
otherwise leave the Vec but document/justify the allocation. Also make the
per-batch RPC call more resilient by wrapping the
get_multiple_new_address_proofs call in a retry loop (e.g., const MAX_RETRIES,
loop with attempt counter, on Err log a warn with batch_idx and attempt, await
tokio::time::sleep with exponential or linear backoff, break on Ok, and only
return Err after retries are exhausted) so transient network errors don’t abort
the whole addresses.chunks/ADDRESS_PROOF_BATCH_SIZE processing.

In @forester/src/processor/v2/processor.rs:
- Around line 29-31: Replace the hard-coded constant MAX_BATCHES_PER_TREE with a
configurable value read at startup (e.g., via an env var like
MAX_BATCHES_PER_TREE or via the existing ProcessorConfig/Config struct) so
operators can adjust without code changes; implement parsing with a sensible
default (20) and bounds-checking, wire the resulting usize into any places that
currently reference MAX_BATCHES_PER_TREE (e.g., the processor's batching logic
and tests), and log the effective configured value at startup so changes are
visible in logs.

In @forester/src/processor/v2/tx_sender.rs:
- Around line 410-421: The tokio::select! block uses the biased; modifier which
intentionally polls the 1-second sleep branch before proof_rx.recv() to force
periodic eligibility re-checks; add a brief inline comment above the select!
(referencing the tokio::select! with biased;, the Duration::from_secs(1) sleep
branch and proof_rx.recv() branch) explaining that biased ensures the timer is
checked first to avoid getting stuck waiting on proofs near epoch boundaries and
to trigger periodic re-evaluation of sender eligibility.

Comment on lines +84 to +105
pub static ref INDEXER_RESPONSE_TIME: GaugeVec = GaugeVec::new(
prometheus::opts!(
"forester_indexer_response_time_seconds",
"Response time for indexer proof requests in seconds"
),
&["operation", "tree_type"]
)
.unwrap_or_else(|e| {
error!("Failed to create metric INDEXER_RESPONSE_TIME: {:?}", e);
std::process::exit(1);
});
pub static ref INDEXER_PROOF_COUNT: IntGaugeVec = IntGaugeVec::new(
prometheus::opts!(
"forester_indexer_proof_count",
"Number of proofs requested vs received from indexer"
),
&["tree_type", "metric"]
)
.unwrap_or_else(|e| {
error!("Failed to create metric INDEXER_PROOF_COUNT: {:?}", e);
std::process::exit(1);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Prefer HistogramVec for response times; clarify whether proof counts are gauges or counters.

A GaugeVec only stores “last observed” latency; if you need percentiles/SLAs, HistogramVec is the standard. Also, if requested/received are cumulative totals, IntCounterVec is more correct than IntGaugeVec.

Comment on lines 107 to 152
for (batch_idx, batch) in addresses.chunks(ADDRESS_PROOF_BATCH_SIZE).enumerate() {
let batch_start = Instant::now();
let batch_addresses: Vec<[u8; 32]> = batch.to_vec();
let batch_size = batch_addresses.len();

match rpc
.indexer()?
.get_multiple_new_address_proofs(merkle_tree, batch_addresses, None)
.await
{
Ok(response) => {
let batch_duration = batch_start.elapsed();
let proofs_received = response.value.items.len();

info!(
"Address proof batch {}: requested={}, received={}, duration={:.3}s",
batch_idx,
batch_size,
proofs_received,
batch_duration.as_secs_f64()
);

if proofs_received != batch_size {
warn!(
"Address proof count mismatch in batch {}: requested={}, received={}",
batch_idx, batch_size, proofs_received
);
}

let address_proofs = match address_proofs_result {
Ok(response) => response.value.items,
Err(e) => {
return Err(anyhow::anyhow!("Failed to get address proofs: {}", e));
all_proofs.extend(response.value.items);
}
Err(e) => {
let batch_duration = batch_start.elapsed();
warn!(
"Failed to get address proofs for batch {} after {:.3}s: {}",
batch_idx,
batch_duration.as_secs_f64(),
e
);
return Err(anyhow::anyhow!(
"Failed to get address proofs for batch {}: {}",
batch_idx,
e
));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Batch processing looks solid; minor allocation note.

The batching logic is well-instrumented with appropriate timing and logging. A few observations:

  1. Line 109: batch.to_vec() creates an owned copy. If get_multiple_new_address_proofs accepts Vec<[u8; 32]>, this is necessary, but if it could accept a slice or &[[u8; 32]], you'd save an allocation per batch.

  2. Resilience consideration: Currently, a single batch failure aborts the entire operation. For transient network issues, you might consider a retry with backoff on individual batches before giving up. This is optional but worth considering for production robustness.

💡 Optional: Add retry logic for batch failures
// Pseudocode for retry pattern
const MAX_RETRIES: usize = 3;
let mut retries = 0;
loop {
    match rpc.indexer()?.get_multiple_new_address_proofs(...).await {
        Ok(response) => { /* success */ break; }
        Err(e) if retries < MAX_RETRIES => {
            retries += 1;
            warn!("Batch {} attempt {} failed, retrying: {}", batch_idx, retries, e);
            tokio::time::sleep(Duration::from_millis(100 * retries as u64)).await;
        }
        Err(e) => return Err(...),
    }
}
🤖 Prompt for AI Agents
In @forester/src/processor/v1/helpers.rs around lines 107 - 152, The code
currently clones each chunk with batch.to_vec() before calling
rpc.indexer()?.get_multiple_new_address_proofs which may be avoidable—if
get_multiple_new_address_proofs can accept a slice change the call site to pass
batch (a &[ [u8;32] ]) instead of allocating via batch.to_vec(); otherwise leave
the Vec but document/justify the allocation. Also make the per-batch RPC call
more resilient by wrapping the get_multiple_new_address_proofs call in a retry
loop (e.g., const MAX_RETRIES, loop with attempt counter, on Err log a warn with
batch_idx and attempt, await tokio::time::sleep with exponential or linear
backoff, break on Ok, and only return Err after retries are exhausted) so
transient network errors don’t abort the whole
addresses.chunks/ADDRESS_PROOF_BATCH_SIZE processing.

Comment on lines +26 to 27
const WORK_ITEM_BATCH_SIZE: usize = 100;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make WORK_ITEM_BATCH_SIZE configurable (and log it) to avoid hard-coded throughput limits.

The new chunking is a good structure for bounded work, but a fixed 100 is likely to need tuning per RPC latency / CU limits / max_concurrent_sends. Consider wiring it to SendBatchedTransactionsConfig (or env) and keeping the current log line.

Also applies to: 95-157

Comment on lines 29 to 31
// TODO: switch back to 20 after merging https://github.com/helius-labs/photon/pull/317
const MAX_BATCHES_PER_TREE: usize = 4;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Make MAX_BATCHES_PER_TREE an explicit config knob (or wire it to env) instead of a hard-coded TODO.

Reducing from 20 → 4 will materially change throughput/latency and can increase queue backlog and “optimistic continuation” iterations. If this is a temporary mitigation, consider plumbing it via config/env (and logging the effective value) so ops can flip it without a code deploy.

🤖 Prompt for AI Agents
In @forester/src/processor/v2/processor.rs around lines 29 - 31, Replace the
hard-coded constant MAX_BATCHES_PER_TREE with a configurable value read at
startup (e.g., via an env var like MAX_BATCHES_PER_TREE or via the existing
ProcessorConfig/Config struct) so operators can adjust without code changes;
implement parsing with a sensible default (20) and bounds-checking, wire the
resulting usize into any places that currently reference MAX_BATCHES_PER_TREE
(e.g., the processor's batching logic and tests), and log the effective
configured value at startup so changes are visible in logs.

…rove dashboard script for better slot timing
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
forester/src/processor/v1/send_transaction.rs (1)

287-290: Timeout vs cancellation conflated: return Timeout when deadline is exceeded.

Right now the pre-send guard returns Cancelled even when the deadline passed; you even left a “// Or Timeout” hint. This makes downstream logs/metrics ambiguous and leaves TransactionSendResult::Timeout underused.

Proposed fix
-            if cancel_signal_clone.load(Ordering::SeqCst) || Instant::now() >= timeout_deadline {
-                return TransactionSendResult::Cancelled; // Or Timeout
-            }
+            if cancel_signal_clone.load(Ordering::SeqCst) {
+                return TransactionSendResult::Cancelled;
+            }
+            if Instant::now() >= timeout_deadline {
+                return TransactionSendResult::Timeout;
+            }

Also applies to: 349-355

forester/src/epoch_manager.rs (2)

770-875: Critical: processing_flag is not reset on early returns/errors; epochs can get stuck “processing”.

You only create the scopeguard reset near the end (Line 869+). But there are multiple return Ok(()) / return Err(...) paths earlier (e.g., registration phase ended at Line 831, or any error bubble-up), leaving the flag true forever for that epoch.

Proposed fix
         if processing_flag
             .compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst)
             .is_err()
         {
             // Another task is already processing this epoch
             debug!("Epoch {} is already being processed, skipping", epoch);
             return Ok(());
         }
-~
+        // Always reset, regardless of how we exit this function.
+        let _reset_guard = scopeguard::guard((), |_| {
+            processing_flag.store(false, Ordering::SeqCst);
+        });
 
         let phases = get_epoch_phases(&self.protocol_config, epoch);
         // Attempt to recover registration info
         debug!("Recovering registration info for epoch {}", epoch);
         let mut registration_info = match self.recover_registration_info(epoch).await {
             Ok(info) => info,
             Err(e) => {
                 ...
             }
         };
         ...
-        // Ensure we reset the processing flag when we're done
-        let _reset_guard = scopeguard::guard((), |_| {
-            processing_flag.store(false, Ordering::SeqCst);
-        });
-
         info!("Exiting process_epoch");
         Ok(())
-->

1585-1593: Use queue pubkey for V2 eligibility checking to match schedule generation.

At line 1589, the V2 code passes &tree_accounts.merkle_tree to check_forester_eligibility, but the function expects queue_pubkey (line 1642) which it uses as input to ForesterEpochPda::get_eligible_forester_index for deterministic forester selection (lines 1665–1670).

For StateV2 trees with separate output queues, this breaks scheduling semantics: the forester determined by hashing merkle_tree won't match the schedule generated using queue. AddressV2 trees (with integrated queues via Pubkey::default()) may accidentally align, but StateV2 will mismatch.

Use &tree_accounts.queue consistently for both V1 and V2, or branch by tree type if the pubkeys differ semantically.

forester/src/processor/v2/tx_sender.rs (1)

391-441: Critical: early termination drops already-received proofs (buffer/pending_batch) and caches under a stale root.

When eligibility ends, you call save_proofs_to_cache(&mut proof_rx, ...) but you do not:

  1. flush self.pending_batch (proofs already ordered and ready to send), nor
  2. drain self.buffer entries already inserted, nor
  3. update self.last_seen_root in the outer task (sender task mutates its own sender_last_root).

Result: you can discard a large amount of completed proof work and also poison reuse because the cache is “warmed” with an old root.

At minimum: before returning, drain pending_batch + any buffer entries to cache, and keep last_seen_root updated as you pop entries / append to pending batch.

-->

Also applies to: 512-518, 533-579

🤖 Fix all issues with AI agents
In @forester/src/api_server.rs:
- Around line 49-63: The /status handler currently always returns 200 on error;
update the Err(e) branch in the status_route closure (where get_forester_status
is awaited and ErrorResponse is built) to return a JSON body with an appropriate
HTTP status using warp::reply::with_status and warp::http::StatusCode instead of
a plain JSON reply. Map errors from get_forester_status to 502 Bad Gateway when
they indicate upstream RPC/indexer failures (otherwise use 500 Internal Server
Error), and return Ok::<_,
warp::Rejection>(warp::reply::with_status(warp::reply::json(&error_response),
chosen_status)). Ensure you import or qualify StatusCode and keep the success
branch returning the existing 200 JSON reply.
- Around line 80-143: The current get_metrics_json parses Prometheus text output
with rsplit_once(' ') which is brittle; instead iterate REGISTRY.gather()
MetricFamily objects directly: for each MetricFamily inspect family.get_name()
(e.g., "forester_transactions_processed_total", "forester_transaction_rate",
"forester_last_run_timestamp", "forester_sol_balance", "queue_length"), then
iterate family.get_metric() items and read the numeric value from the
appropriate metric type (get_counter().get_value() or get_gauge().get_value() /
get_untyped if used), extract labels by scanning metric.get_label() for
label.name == "epoch" / "pubkey" / "tree_pubkey" to get the key, and populate
transactions_processed, transaction_rate, last_run_timestamp, forester_balances,
and queue_lengths accordingly; remove the brittle string parsing (and/or replace
extract_label with a small helper that reads label values from
metric.get_label()). Ensure you handle multiple metrics per family and default
missing values safely.

In @forester/src/epoch_manager.rs:
- Around line 312-324: The scopeguard currently aborts current_previous_handle,
new_tree_handle, and balance_check_handle but omits monitor_handle, leaving the
epoch monitor task running on shutdown; update the scopeguard tuples and
closures (the one at the shown diff and the similar guards around the earlier
285-289 and later 326-371 blocks) to include monitor_handle and call .abort() on
it alongside the other handles so the monitor task is properly aborted on
drop/early exit.
- Around line 2896-2924: The code currently treats
config.general_config.group_authority == None as "try protocol default and
filter" which is a behavioral change; make this explicit by adding a new boolean
config flag (e.g., use_protocol_group_authority) in the same config struct and
check that flag before calling fetch_protocol_group_authority, and log at
info/warn level when the protocol default is being used (reference
config.general_config.group_authority, use_protocol_group_authority, and
fetch_protocol_group_authority). Ensure the filtering branch that retains trees
by group_authority (reference fetched_trees.retain and variable group_authority)
emits a clear metric/telemetry counter for "filtered_out_trees" reporting how
many trees were removed (before_count - fetched_trees.len()) and include an info
log that it was applied due to the explicit flag.

In @forester/src/forester_status.rs:
- Around line 456-497: Replace the magic numbers and add safety docs in
parse_tree_status for TreeType::StateV1: use the existing
STATE_MERKLE_TREE_HEIGHT constant instead of let height = 26u64; add a brief
comment above the unsafe parse_hash_set_from_bytes::<QueueAccount>(&acc.data)
call explaining why the unsafe read is required for V1 legacy format and why
.ok() error handling is acceptable; extract the literal used with
.saturating_sub(3) on IndexedMerkleTree::next_index() into a named constant
(e.g., INDEXED_MERKLE_TREE_RESERVED_SLOTS) with a comment describing the
reserved sentinel elements and mirror the same constant/comment used in
rollover/operations.rs to keep behavior consistent.

In @forester/src/processor/v1/helpers.rs:
- Around line 97-173: The code currently discards the returned proof.address and
trusts the indexer to preserve ordering, which can mis-assign proofs when
ordering differs or items are dropped; update the handling around
get_multiple_new_address_proofs and the subsequent zip of address_items with
address_proofs to preserve and validate the response address field for each
returned proof: when receiving response.value.items, keep the returned address
alongside its proof, compare each returned address to the requested hash for
that slot (or use the returned address to re-map proofs to the original
requested addresses), and if any proof.address is missing or does not match the
requested address either reorder/match accordingly or return an error instead of
merely warning; apply the same validation/fail-fast behavior to the state-proof
path as well (the code that zips address_items with address_proofs and the
analogous state proof section).

In @forester/src/processor/v1/send_transaction.rs:
- Around line 189-201: The fetch call for queue_item_data currently logs an
error then maps any fetch failure to ForesterError::General; harmonize with the
new helper by (1) replacing the pre-mapping error! call with warn! (or removing
it) so we don't duplicate the helper's warning when it returns Ok(Vec::new()),
and (2) ensure only true failures produce ForesterError::General from the
fetch_queue_item_data call (keep the map_err but lower the log level to warn! or
omit logging here since fetch_queue_item_data already warns). Target the block
around pool.get_connection(), fetch_queue_item_data, queue_item_data and update
the error! -> warn! (or remove) and keep ForesterError::General mapping only for
actual errors.

Comment on lines 312 to 324
let _guard = scopeguard::guard(
(
monitor_handle,
current_previous_handle,
new_tree_handle,
balance_check_handle,
),
|(h1, h2, h3, h4)| {
|(h2, h3, h4)| {
info!("Aborting EpochManager background tasks");
h1.abort();
h2.abort();
h3.abort();
h4.abort();
},
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Abort the epoch monitor task on shutdown / early exit (it’s currently left running).

You abort current_previous_handle, new_tree_handle, balance_check_handle, but not monitor_handle. When run() is dropped (e.g., run_service shutdown branch), the monitor task can continue running until it hits a send error/backoff path.

Proposed fix
-        let mut monitor_handle = tokio::spawn({
+        let mut monitor_handle = tokio::spawn({
             let self_clone = Arc::clone(&self);
             let tx_clone = Arc::clone(&tx);
             async move { self_clone.monitor_epochs(tx_clone).await }
         });

         let _guard = scopeguard::guard(
             (
+                monitor_handle,
                 current_previous_handle,
                 new_tree_handle,
                 balance_check_handle,
             ),
-            |(h2, h3, h4)| {
+            |(h1, h2, h3, h4)| {
                 info!("Aborting EpochManager background tasks");
+                h1.abort();
                 h2.abort();
                 h3.abort();
                 h4.abort();
             },
         );
-->

Also applies to: 285-289, 326-371

🤖 Prompt for AI Agents
In @forester/src/epoch_manager.rs around lines 312 - 324, The scopeguard
currently aborts current_previous_handle, new_tree_handle, and
balance_check_handle but omits monitor_handle, leaving the epoch monitor task
running on shutdown; update the scopeguard tuples and closures (the one at the
shown diff and the similar guards around the earlier 285-289 and later 326-371
blocks) to include monitor_handle and call .abort() on it alongside the other
handles so the monitor task is properly aborted on drop/early exit.

Comment on lines +2896 to +2924
let group_authority = match config.general_config.group_authority {
Some(ga) => Some(ga),
None => {
match fetch_protocol_group_authority(&*rpc).await {
Ok(ga) => {
info!("Using protocol default group authority: {}", ga);
Some(ga)
}
Err(e) => {
warn!(
"Failed to fetch protocol group authority, processing all trees: {:?}",
e
);
None
}
}
}
};

if let Some(group_authority) = group_authority {
let before_count = fetched_trees.len();
fetched_trees.retain(|tree| tree.owner == group_authority);
info!(
"Filtered trees by group authority {}: {} -> {} trees",
group_authority,
before_count,
fetched_trees.len()
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Group-authority filtering: default behavior change should be explicit (and ideally observable).

Right now, GeneralConfig.group_authority == None triggers “try protocol default authority and filter” if fetch succeeds. That’s a meaningful default behavioral change; consider logging this more prominently (you do info!) and adding an explicit config knob like use_protocol_group_authority: bool to avoid surprises in devnets/local validators where the RegisteredProgram PDA may not exist.

Also consider surfacing “filtered out N trees” as a metric to make this operationally obvious.

-->

🤖 Prompt for AI Agents
In @forester/src/epoch_manager.rs around lines 2896 - 2924, The code currently
treats config.general_config.group_authority == None as "try protocol default
and filter" which is a behavioral change; make this explicit by adding a new
boolean config flag (e.g., use_protocol_group_authority) in the same config
struct and check that flag before calling fetch_protocol_group_authority, and
log at info/warn level when the protocol default is being used (reference
config.general_config.group_authority, use_protocol_group_authority, and
fetch_protocol_group_authority). Ensure the filtering branch that retains trees
by group_authority (reference fetched_trees.retain and variable group_authority)
emits a clear metric/telemetry counter for "filtered_out_trees" reporting how
many trees were removed (before_count - fetched_trees.len()) and include an info
log that it was applied due to the explicit flag.

Comment on lines 259 to 292
// Build full schedule for each tree
for status in &mut tree_statuses {
let queue_pubkey: Pubkey = status.queue.parse().unwrap_or_default();
let mut schedule: Vec<Option<usize>> =
Vec::with_capacity(total_light_slots as usize);

for light_slot_idx in 0..total_light_slots {
let forester_idx = ForesterEpochPda::get_eligible_forester_index(
light_slot_idx,
&queue_pubkey,
total_epoch_weight,
current_active_epoch,
)
.ok()
.and_then(|eligible_idx| {
active_epoch_forester_pdas
.iter()
.find(|pda| pda.is_eligible(eligible_idx))
.and_then(|pda| authority_to_index.get(&pda.authority.to_string()))
.copied()
});
schedule.push(forester_idx);
}

// Set current assigned forester
if let Some(current_idx) = current_light_slot {
if let Some(Some(forester_idx)) = schedule.get(current_idx as usize) {
status.assigned_forester =
Some(active_epoch_foresters[*forester_idx].authority.clone());
}
}

status.schedule = schedule;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Array indexing can panic; validate queue pubkey parsing

Two issues in the schedule-building logic:

  1. Line 261: status.queue.parse().unwrap_or_default() silently falls back to a default Pubkey::zero() if parsing fails. This would cause incorrect forester assignment calculations, as the queue pubkey is a critical input to the hash-based scheduling algorithm.

  2. Line 287: Array access active_epoch_foresters[*forester_idx] can panic if forester_idx is out of bounds. While the code path suggests forester_idx comes from authority_to_index.get(), which should be valid, there's no explicit bounds check.

🛡️ Proposed safety improvements
                 // Build full schedule for each tree
                 for status in &mut tree_statuses {
-                    let queue_pubkey: Pubkey = status.queue.parse().unwrap_or_default();
+                    let queue_pubkey: Pubkey = match status.queue.parse() {
+                        Ok(pk) => pk,
+                        Err(e) => {
+                            warn!("Invalid queue pubkey '{}': {:?}", status.queue, e);
+                            continue; // Skip this tree
+                        }
+                    };
                     let mut schedule: Vec<Option<usize>> =
                         Vec::with_capacity(total_light_slots as usize);

                     for light_slot_idx in 0..total_light_slots {
                         let forester_idx = ForesterEpochPda::get_eligible_forester_index(
                             light_slot_idx,
                             &queue_pubkey,
                             total_epoch_weight,
                             current_active_epoch,
                         )
                         .ok()
                         .and_then(|eligible_idx| {
                             active_epoch_forester_pdas
                                 .iter()
                                 .find(|pda| pda.is_eligible(eligible_idx))
                                 .and_then(|pda| authority_to_index.get(&pda.authority.to_string()))
                                 .copied()
                         });
                         schedule.push(forester_idx);
                     }

                     // Set current assigned forester
                     if let Some(current_idx) = current_light_slot {
                         if let Some(Some(forester_idx)) = schedule.get(current_idx as usize) {
-                            status.assigned_forester =
-                                Some(active_epoch_foresters[*forester_idx].authority.clone());
+                            if let Some(forester) = active_epoch_foresters.get(*forester_idx) {
+                                status.assigned_forester = Some(forester.authority.clone());
+                            }
                         }
                     }

                     status.schedule = schedule;
                 }

Comment on lines +456 to +497
fn parse_tree_status(
tree: &TreeAccounts,
merkle_account: Option<Account>,
queue_account: Option<Account>,
) -> crate::Result<TreeStatus> {
let mut merkle_account =
merkle_account.ok_or_else(|| anyhow::anyhow!("Merkle tree account not found"))?;

let (fullness_percentage, next_index, threshold, queue_length, v2_queue_info) = match tree
.tree_type
{
TreeType::StateV1 => {
let tree_account = StateMerkleTreeAccount::deserialize(&mut &merkle_account.data[8..])
.map_err(|e| anyhow::anyhow!("Failed to deserialize StateV1 metadata: {}", e))?;

let height = 26u64;
let capacity = 1u64 << height;
let threshold_val = capacity
.saturating_mul(tree_account.metadata.rollover_metadata.rollover_threshold)
/ 100;

let merkle_tree =
parse_concurrent_merkle_tree_from_bytes::<StateMerkleTreeAccount, Poseidon, 26>(
&merkle_account.data,
)
.map_err(|e| anyhow::anyhow!("Failed to parse StateV1 tree: {:?}", e))?;

let next_index = merkle_tree.next_index() as u64;
let fullness = next_index as f64 / capacity as f64 * 100.0;

let queue_len = queue_account.and_then(|acc| {
unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) }
.ok()
.map(|hs| {
hs.iter()
.filter(|(_, cell)| cell.sequence_number.is_none())
.count() as u64
})
});

(fullness, next_index, threshold_val, queue_len, None)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let me locate and examine the full function context
fd "forester_status.rs" --type f

Repository: Lightprotocol/light-protocol

Length of output: 103


🏁 Script executed:

#!/bin/bash
# Search for tree height constants and configurations
rg -n "height.*26|height.*32|height.*40" --type rust | grep -v target | head -30

Repository: Lightprotocol/light-protocol

Length of output: 3114


🏁 Script executed:

#!/bin/bash
# Check for STATE_V1_TREE_HEIGHT or similar constants
rg -n "STATE.*HEIGHT|TREE.*HEIGHT|const.*26" --type rust | grep -v target | head -20

Repository: Lightprotocol/light-protocol

Length of output: 2413


🏁 Script executed:

#!/bin/bash
# Find the parse_tree_status function to see full context including line 519
fd "forester_status.rs" -x cat -n {} | head -600 | tail -200

Repository: Lightprotocol/light-protocol

Length of output: 8674


🏁 Script executed:

#!/bin/bash
# Check parse_hash_set_from_bytes implementation and its safety justification
rg -n "parse_hash_set_from_bytes" --type rust -A 10 | head -50

Repository: Lightprotocol/light-protocol

Length of output: 2584


🏁 Script executed:

#!/bin/bash
# Look for V1 tree account type definitions and height specifications
rg -n "StateMerkleTreeAccount|AddressMerkleTreeAccount" --type rust -B 2 -A 5 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 6573


🏁 Script executed:

#!/bin/bash
# Check if there are documented reasons for unsafe blocks in similar contexts
rg -n "unsafe.*parse.*hash|unsafe.*deserialize" --type rust -B 3 | head -40

Repository: Lightprotocol/light-protocol

Length of output: 786


🏁 Script executed:

#!/bin/bash
# Search for parse_hash_set_from_bytes implementation and safety context
fd "parse_hash_set_from_bytes" --type f -x grep -l {} \; | head -10

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Find where parse_hash_set_from_bytes is defined
rg -n "fn parse_hash_set_from_bytes" --type rust -A 10

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Search for any documentation or comments about the saturating_sub(3)
rg -n "saturating_sub.*3|IndexedMerkleTree.*next_index|AddressV1.*next_index" --type rust -B 5 -A 2 | head -60

Repository: Lightprotocol/light-protocol

Length of output: 2002


🏁 Script executed:

#!/bin/bash
# Check if there are safety comments or documentation in the file
cat -n forester/src/forester_status.rs | head -30

Repository: Lightprotocol/light-protocol

Length of output: 1403


🏁 Script executed:

#!/bin/bash
# Look for why indexed merkle trees need the -3 adjustment
rg -n "indexed.*merkle|IndexedMerkleTree" --type rust -B 3 -A 3 | grep -A 5 -B 5 "next_index\|sentinel" | head -40

Repository: Lightprotocol/light-protocol

Length of output: 3242


🏁 Script executed:

#!/bin/bash
# Search for parse_hash_set_from_bytes in the codebase
rg -n "parse_hash_set_from_bytes" --type rust -B 2 -A 15 | head -100

Repository: Lightprotocol/light-protocol

Length of output: 4044


🏁 Script executed:

#!/bin/bash
# Look for any documentation about the -3 constant for indexed trees
rg -n "HIGHEST_ADDRESS|sentinel|indexed.*tree.*3" --type rust | head -30

Repository: Lightprotocol/light-protocol

Length of output: 1762


🏁 Script executed:

#!/bin/bash
# Check the learnings file to understand V1/V2 architecture expectations
cat sdk-libs/program-test/src/indexer/test_indexer.rs | grep -A 5 -B 5 "height.*26\|height.*40\|height.*32" | head -40

Repository: Lightprotocol/light-protocol

Length of output: 2228


🏁 Script executed:

#!/bin/bash
# Look at the constants file to confirm what constants exist
cat sdk-libs/sdk-types/src/constants.rs | head -40

Repository: Lightprotocol/light-protocol

Length of output: 2045


🏁 Script executed:

#!/bin/bash
# Find the forester_utils implementation of parse_hash_set_from_bytes
fd "account_zero_copy\|forester_utils" --type f | head -10

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Search in forester_utils for the unsafe function
find . -path ./target -prune -o -name "*.rs" -print | xargs grep -l "parse_hash_set_from_bytes" | head -5

Repository: Lightprotocol/light-protocol

Length of output: 105


🏁 Script executed:

#!/bin/bash
# Let me check the forester_utils package to understand parse_hash_set_from_bytes
fd "forester.utils\|forester_utils" -type d | head -5

Repository: Lightprotocol/light-protocol

Length of output: 242


🏁 Script executed:

#!/bin/bash
# Check if there's documentation on why -3 is subtracted in indexed merkle trees
rg -n "next_index.*3|subtract.*3|indexed.*element" program-libs/indexed-merkle-tree/src/ --type rust -B 2 -A 2 | head -50

Repository: Lightprotocol/light-protocol

Length of output: 3773


🏁 Script executed:

#!/bin/bash
# Let's check the learning files mentioned in the context
ls -la sdk-libs/program-test/src/indexer/ | head -20

Repository: Lightprotocol/light-protocol

Length of output: 503


🏁 Script executed:

#!/bin/bash
# Let me check the indexed merkle tree initialization to understand the 3-element structure
rg -n "new\(\|init\(" program-libs/indexed-merkle-tree/src/lib.rs -B 2 -A 10 | head -60

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation in the indexed merkle tree about reserved elements
rg -n "reserved|sentinel|lowest|highest" program-libs/indexed-merkle-tree/src/lib.rs -B 2 -A 2 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 711


Use existing height constants and document magic numbers in V1 tree parsing

Three maintainability concerns in the V1 tree status parsing:

  1. Hardcoded heights instead of constants (lines 471, 499): Replace let height = 26u64; with the existing STATE_MERKLE_TREE_HEIGHT constant from sdk-libs/sdk-types/src/constants.rs. This makes the connection between V1 specs and implementation explicit.

  2. Unsafe blocks lack safety justification (lines 487, 523): The unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } calls need a comment explaining why this operation is unsafe and why error handling via .ok() is sufficient. (Note: V2 trees use safe deserialization methods, so this is V1-specific legacy code.)

  3. Undocumented magic number (line 519): The .saturating_sub(3) on IndexedMerkleTree::next_index() should be extracted to a named constant with a comment. Indexed merkle trees reserve elements (lowest + highest sentinel values), but the exact count and reasoning should be documented. This same pattern appears in forester/src/rollover/operations.rs:114.

🤖 Prompt for AI Agents
In @forester/src/forester_status.rs around lines 456 - 497, Replace the magic
numbers and add safety docs in parse_tree_status for TreeType::StateV1: use the
existing STATE_MERKLE_TREE_HEIGHT constant instead of let height = 26u64; add a
brief comment above the unsafe
parse_hash_set_from_bytes::<QueueAccount>(&acc.data) call explaining why the
unsafe read is required for V1 legacy format and why .ok() error handling is
acceptable; extract the literal used with .saturating_sub(3) on
IndexedMerkleTree::next_index() into a named constant (e.g.,
INDEXED_MERKLE_TREE_RESERVED_SLOTS) with a comment describing the reserved
sentinel elements and mirror the same constant/comment used in
rollover/operations.rs to keep behavior consistent.

Comment on lines +84 to +105
pub static ref INDEXER_RESPONSE_TIME: GaugeVec = GaugeVec::new(
prometheus::opts!(
"forester_indexer_response_time_seconds",
"Response time for indexer proof requests in seconds"
),
&["operation", "tree_type"]
)
.unwrap_or_else(|e| {
error!("Failed to create metric INDEXER_RESPONSE_TIME: {:?}", e);
std::process::exit(1);
});
pub static ref INDEXER_PROOF_COUNT: IntGaugeVec = IntGaugeVec::new(
prometheus::opts!(
"forester_indexer_proof_count",
"Number of proofs requested vs received from indexer"
),
&["tree_type", "metric"]
)
.unwrap_or_else(|e| {
error!("Failed to create metric INDEXER_PROOF_COUNT: {:?}", e);
std::process::exit(1);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

cat -n forester/src/metrics.rs | sed -n '84,105p'

Repository: Lightprotocol/light-protocol

Length of output: 1028


🏁 Script executed:

cat -n forester/src/metrics.rs | sed -n '134,139p'

Repository: Lightprotocol/light-protocol

Length of output: 459


🏁 Script executed:

cat -n forester/src/metrics.rs | sed -n '211,228p'

Repository: Lightprotocol/light-protocol

Length of output: 793


🏁 Script executed:

rg -n "update_indexer_response_time|update_indexer_proof_count" forester/src/metrics.rs -A 5

Repository: Lightprotocol/light-protocol

Length of output: 638


🏁 Script executed:

fd CLAUDE.md

Repository: Lightprotocol/light-protocol

Length of output: 549


🏁 Script executed:

cat CLAUDE.md

Repository: Lightprotocol/light-protocol

Length of output: 10409


🏁 Script executed:

cat -n forester/src/metrics.rs | head -30

Repository: Lightprotocol/light-protocol

Length of output: 1243


🌐 Web query:

Prometheus Rust client histogram vs gauge latency metrics best practices

💡 Result:

Short guidance — use histograms (not plain gauges) for latency; configure buckets, labels, and exemplars; use summaries only for non-aggregatable, per-instance quantiles.

Details / best practices:

  • Use a Histogram for request/operation latency so you can compute p50/p95/p99 across instances with PromQL (histograms store buckets + sum + count). [1][4]
  • Do not use a Gauge to record individual request durations (Gauge is for values that go up/down, not for aggregatable latency distributions). A Gauge can hold a duration value but you lose ability to compute correct cross-instance percentiles. [1][5]
  • Choose buckets intentionally: pick bucket boundaries that cover your latency SLOs (include small buckets for tail behavior), avoid too many buckets (memory cost) and too few (low precision). Align buckets with recording/alerting thresholds. [2][3]
  • Use the client library helpers (HistogramTimer or observe_duration) to measure durations to avoid errors and extra syscalls. In Rust prometheus/prometheus-client crates provide timers and Histogram types. [2][5]
  • Keep labels low-cardinality on latency metrics (method, status, endpoint); avoid high-cardinality labels (user IDs, request IDs) that explode series. Label cardinality is a primary cost. [4]
  • Separate success vs error latencies with a status label (e.g., code or success) so tails attributable to failures don’t skew SLOs. [4]
  • Use Exemplars (trace IDs) if you want linking traces → metric samples; supported in some Rust tooling (prometheus-client / autometrics integrations). Use exemplars for debugging slow traces. [2]
  • Use Summaries only when you need accurate per-instance quantiles and do not need to aggregate across instances; otherwise prefer Histograms. [3]
  • Pre-warm/export zero values for expected series or rely on scrape target health (up) to detect staleness; avoid relying on gauges to signal freshness. [4][5]

Rust-specific notes:

  • Two common Rust clients: prometheus (older crate) and prometheus-client (newer, OpenMetrics-oriented). Both implement Histogram and timers; check crate docs for APIs and exemplar support. Choose the crate that fits your needs and exemplars/compatibility. [2][5]

If you want, I can suggest concrete bucket definitions for your service (give typical latency ranges and SLOs).


Consider Histogram for latency metrics; Gauge loses distribution visibility.

For INDEXER_RESPONSE_TIME, use HistogramVec instead of GaugeVec. Gauges are designed for values that naturally fluctuate (up/down), not latency distributions. A Histogram preserves bucket data and enables computing percentiles (p50/p95/p99) across instances—essential for SLO tracking. If you intentionally want "last observed duration only," Gauge is acceptable; otherwise, Histogram is the right fit.

Also: update_indexer_response_time() already logs at debug level (line 215), but update_indexer_proof_count() does not. Add a debug log to update_indexer_proof_count() for observability parity:

debug!("Indexer proof count for {}: requested={}, received={}", tree_type, requested, received);

Applies to same pattern at: 134–139 (registration), 211–228 (update functions)

Comment on lines 97 to 173
let address_proofs = if let Some((merkle_tree, addresses)) = address_data {
let total_addresses = addresses.len();
info!(
"Fetching {} address proofs in batches of {}",
total_addresses, ADDRESS_PROOF_BATCH_SIZE
);

join!(address_future, state_future)
};
let start_time = Instant::now();
let mut all_proofs = Vec::with_capacity(total_addresses);

for (batch_idx, batch) in addresses.chunks(ADDRESS_PROOF_BATCH_SIZE).enumerate() {
let batch_start = Instant::now();
let batch_addresses: Vec<[u8; 32]> = batch.to_vec();
let batch_size = batch_addresses.len();

match rpc
.indexer()?
.get_multiple_new_address_proofs(merkle_tree, batch_addresses, None)
.await
{
Ok(response) => {
let batch_duration = batch_start.elapsed();
let proofs_received = response.value.items.len();

info!(
"Address proof batch {}: requested={}, received={}, duration={:.3}s",
batch_idx,
batch_size,
proofs_received,
batch_duration.as_secs_f64()
);

if proofs_received != batch_size {
warn!(
"Address proof count mismatch in batch {}: requested={}, received={}",
batch_idx, batch_size, proofs_received
);
}

let address_proofs = match address_proofs_result {
Ok(response) => response.value.items,
Err(e) => {
return Err(anyhow::anyhow!("Failed to get address proofs: {}", e));
all_proofs.extend(response.value.items);
}
Err(e) => {
let batch_duration = batch_start.elapsed();
warn!(
"Failed to get address proofs for batch {} after {:.3}s: {}",
batch_idx,
batch_duration.as_secs_f64(),
e
);
return Err(anyhow::anyhow!(
"Failed to get address proofs for batch {}: {}",
batch_idx,
e
));
}
}
}

let total_duration = start_time.elapsed();
info!(
"Address proofs complete: requested={}, received={}, total_duration={:.3}s",
total_addresses,
all_proofs.len(),
total_duration.as_secs_f64()
);

update_indexer_response_time(
"get_multiple_new_address_proofs",
"AddressV1",
total_duration.as_secs_f64(),
);
update_indexer_proof_count("AddressV1", total_addresses as i64, all_proofs.len() as i64);

all_proofs
} else {
Vec::new()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find the indexer/rpc implementation
find . -type f -name "*.rs" | xargs grep -l "get_multiple_new_address_proofs" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 501


🏁 Script executed:

# Also look for the lines 226-253 mentioned
find . -type f -name "helpers.rs" | xargs grep -n "get_multiple_new_address_proofs" | head -10

Repository: Lightprotocol/light-protocol

Length of output: 289


🏁 Script executed:

# Search for indexer API definitions
find . -type f \( -name "*.rs" -o -name "*.go" \) | xargs grep -l "get_multiple_new_address_proofs" 2>/dev/null | head -20

Repository: Lightprotocol/light-protocol

Length of output: 501


🏁 Script executed:

# Look at the structure of the repository to understand where indexer/rpc clients live
ls -la

Repository: Lightprotocol/light-protocol

Length of output: 1353


🏁 Script executed:

# Check the indexer trait definition
cat -n sdk-libs/client/src/indexer/indexer_trait.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 4749


🏁 Script executed:

# Check photon indexer implementation
wc -l sdk-libs/client/src/indexer/photon_indexer.rs

Repository: Lightprotocol/light-protocol

Length of output: 122


🏁 Script executed:

# Check the full context around line 114 and also look at lines 226-253
cat -n forester/src/processor/v1/helpers.rs | sed -n '90,180p'

Repository: Lightprotocol/light-protocol

Length of output: 3953


🏁 Script executed:

# Check what happens with the address_proofs after they're collected
cat -n forester/src/processor/v1/helpers.rs | sed -n '170,280p'

Repository: Lightprotocol/light-protocol

Length of output: 5148


🏁 Script executed:

# Check the photon_indexer implementation for get_multiple_new_address_proofs
cat -n sdk-libs/client/src/indexer/photon_indexer.rs | grep -A 30 "get_multiple_new_address_proofs"

Repository: Lightprotocol/light-protocol

Length of output: 4672


🏁 Script executed:

# Also check the RPC wrapper layer
cat -n sdk-libs/client/src/rpc/indexer.rs | grep -A 30 "get_multiple_new_address_proofs"

Repository: Lightprotocol/light-protocol

Length of output: 1618


🏁 Script executed:

# Check what NewAddressProofWithContext contains (to see if it has the address hash)
cat -n sdk-libs/client/src/indexer/types.rs | grep -A 20 "NewAddressProofWithContext"

Repository: Lightprotocol/light-protocol

Length of output: 886


🏁 Script executed:

# Check what address_items contains - what fields does it have?
cat -n forester/src/processor/v1/helpers.rs | sed -n '1,96p' | tail -40

Repository: Lightprotocol/light-protocol

Length of output: 1494


🏁 Script executed:

# Check the types of address_items
grep -B 20 "let address_items" forester/src/processor/v1/helpers.rs | head -30

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Look for the definition/type of the items being iterated
grep -n "address_items\|address_data" forester/src/processor/v1/helpers.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 465


🏁 Script executed:

# Check if NewAddressProofWithContext or photon API has the address hash/value stored
cat -n sdk-libs/photon-api/src/models/mod.rs | grep -A 30 "NewAddressProof"

Repository: Lightprotocol/light-protocol

Length of output: 5228


🏁 Script executed:

# Search for the actual structure definition in photon API
find . -type f -name "*.rs" -path "*/photon-api/*" | xargs grep -l "MerkleContextWithNewAddressProof" | head -5

Repository: Lightprotocol/light-protocol

Length of output: 371


🏁 Script executed:

# Look for the structure that contains the address info
cat -n sdk-libs/photon-api/src/models/merkle_context_with_new_address_proof.rs 2>/dev/null | head -50

Repository: Lightprotocol/light-protocol

Length of output: 1957


🏁 Script executed:

# Or search more broadly for what fields are returned
rg "pub struct.*NewAddressProof" sdk-libs/photon-api/src/models/ -A 15

Repository: Lightprotocol/light-protocol

Length of output: 9653


🏁 Script executed:

# Check what the actual Photon API response structure looks like in the generated code
find sdk-libs/photon-api -name "*.rs" -exec grep -l "address" {} \; | xargs grep "pub struct" | grep -i "proof\|address" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1069


🏁 Script executed:

# Critical: check what NewAddressProofWithContext looks like in the indexer client types
cat -n sdk-libs/client/src/indexer/types.rs | sed -n '200,215p'

Repository: Lightprotocol/light-protocol

Length of output: 737


🏁 Script executed:

# Check if the proof validation happens - does the code check that proof.address matches the requested address?
cat -n forester/src/processor/v1/helpers.rs | sed -n '226,260p'

Repository: Lightprotocol/light-protocol

Length of output: 1978


🏁 Script executed:

# Look for any validation logic when constructing instructions
grep -A 5 "proof.low_address_index\|proof.low_address_value" forester/src/processor/v1/helpers.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 545


🏁 Script executed:

# Check the conversion from MerkleContextWithNewAddressProof to NewAddressProofWithContext
cat -n sdk-libs/client/src/indexer/photon_indexer.rs | sed -n '1305,1360p'

Repository: Lightprotocol/light-protocol

Length of output: 2940


🏁 Script executed:

# See if the address field is ever used in the conversion
cat -n sdk-libs/client/src/indexer/photon_indexer.rs | sed -n '1307,1350p'

Repository: Lightprotocol/light-protocol

Length of output: 2456


🏁 Script executed:

# Check if there's any defensive length checking before the zip
cat -n forester/src/processor/v1/helpers.rs | sed -n '155,230p'

Repository: Lightprotocol/light-protocol

Length of output: 3233


🏁 Script executed:

# Verify that photon_proof.address is indeed available but discarded
rg "photon_proof\." sdk-libs/client/src/indexer/photon_indexer.rs | grep -v "photon_proof\." | head -20

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Check what happens if lengths don't match in zip
cat -n forester/src/processor/v1/helpers.rs | sed -n '156,162p'

Repository: Lightprotocol/light-protocol

Length of output: 336


🏁 Script executed:

# Final verification: check if there's any documentation or tests about ordering guarantees
find . -type f \( -name "*.rs" -o -name "*.md" \) -path "*photon*" | xargs grep -l "order\|guarantee" | head -5

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Check if there are any integration tests that verify this behavior
find . -type f -name "*.rs" -path "*/tests/*" | xargs grep -l "get_multiple_new_address_proofs" | head -3

Repository: Lightprotocol/light-protocol

Length of output: 140


🏁 Script executed:

# Check how the tests use get_multiple_new_address_proofs
grep -B 5 -A 15 "get_multiple_new_address_proofs" forester/tests/test_utils.rs | head -50

Repository: Lightprotocol/light-protocol

Length of output: 942


🏁 Script executed:

# Check if there's any test of ordering or validation
grep -B 5 -A 10 "zip\|address_proofs" forester/tests/test_utils.rs | head -40

Repository: Lightprotocol/light-protocol

Length of output: 1407


Add validation that returned proofs correspond to requested addresses; currently the address field from the Photon API is discarded, leaving no way to ensure correct proof-to-address mapping if the indexer returns items out-of-order.

The code requests address hashes at lines 73-76, but when get_multiple_new_address_proofs returns proofs, the address field in the API response is discarded during conversion (photon_indexer.rs:1344-1356). This removes the only opportunity to validate that each proof actually corresponds to the address it was requested for.

When zipping address_items with address_proofs (line 226), if the indexer returns proofs in a different order than requested or drops some items, the wrong proof gets silently applied to each address. The mismatch warning (lines 129-134) only logs the count discrepancy but continues execution.

Either preserve and validate the returned address field against the requested hash, document and enforce the indexer's ordering guarantee in code, or error if proofs_received != batch_size instead of warning and proceeding.

Also applies to: 226-253 (state proofs follow the same pattern at line 255)

🤖 Prompt for AI Agents
In @forester/src/processor/v1/helpers.rs around lines 97 - 173, The code
currently discards the returned proof.address and trusts the indexer to preserve
ordering, which can mis-assign proofs when ordering differs or items are
dropped; update the handling around get_multiple_new_address_proofs and the
subsequent zip of address_items with address_proofs to preserve and validate the
response address field for each returned proof: when receiving
response.value.items, keep the returned address alongside its proof, compare
each returned address to the requested hash for that slot (or use the returned
address to re-map proofs to the original requested addresses), and if any
proof.address is missing or does not match the requested address either
reorder/match accordingly or return an error instead of merely warning; apply
the same validation/fail-fast behavior to the state-proof path as well (the code
that zips address_items with address_proofs and the analogous state proof
section).

Comment on lines 189 to +201
let queue_item_data = {
let mut rpc = pool.get_connection().await.map_err(|e| {
error!(tree = %tree_id_str, "Failed to get RPC for queue data: {:?}", e);
ForesterError::RpcPool(e)
})?;
fetch_queue_item_data(
&mut *rpc,
&tree_accounts.queue,
queue_fetch_start_index,
queue_fetch_length,
queue_total_capacity,
)
.await
.map_err(|e| {
error!(tree = %tree_id_str, "Failed to fetch queue item data: {:?}", e);
ForesterError::General {
error: format!("Fetch queue data failed for {}: {}", tree_id_str, e),
}
})?
fetch_queue_item_data(&mut *rpc, &tree_accounts.queue, queue_fetch_start_index)
.await
.map_err(|e| {
error!(tree = %tree_id_str, "Failed to fetch queue item data: {:?}", e);
ForesterError::General {
error: format!("Fetch queue data failed for {}: {}", tree_id_str, e),
}
})?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Queue fetch refactor looks consistent with new fetch_queue_item_data API; consider log-level harmonization.

The new call site matches the snippet signature (rpc, queue_pubkey, start_index). One small suggestion: you now emit error!(...) + wrap into ForesterError::General { ... }; consider whether this should be warn! if empty/missing queues are expected (the helper already warns and returns Ok(Vec::new())).

🤖 Prompt for AI Agents
In @forester/src/processor/v1/send_transaction.rs around lines 189 - 201, The
fetch call for queue_item_data currently logs an error then maps any fetch
failure to ForesterError::General; harmonize with the new helper by (1)
replacing the pre-mapping error! call with warn! (or removing it) so we don't
duplicate the helper's warning when it returns Ok(Vec::new()), and (2) ensure
only true failures produce ForesterError::General from the fetch_queue_item_data
call (keep the map_err but lower the log level to warn! or omit logging here
since fetch_queue_item_data already warns). Target the block around
pool.get_connection(), fetch_queue_item_data, queue_item_data and update the
error! -> warn! (or remove) and keep ForesterError::General mapping only for
actual errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In @forester/src/queue_helpers.rs:
- Around line 47-83: The parsing logic in parse_batch_metadata is duplicated in
parse_state_v2_queue_info; refactor by having parse_state_v2_queue_info call
parse_batch_metadata (or extract the common loop into a new helper used by both)
so BatchInfo and ParsedBatchData are produced once; ensure you pass the same
&[light_batched_merkle_tree::batch::Batch] input (or convert the queue batches
to that shape), preserve zkp_batch_size and items_in_current_zkp_batch
semantics, and update usage sites to consume the returned ParsedBatchData
instead of repeating the loop.
- Around line 128-133: The code is fragile because it matches b.state as the
string "Fill"; instead compare the enum/state directly or use a stable
indicator: update the lookup that sets output_items_in_current_zkp_batch to
identify the Fill batch by inspecting the real batch state enum (e.g., call
output_queue.batch_metadata.batches.get(index).map(|b| b.get_state() ==
BatchState::Fill) when iterating batch_infos) or add a boolean on BatchInfo
(e.g., is_fill) and use that instead of string comparison; change references to
b.state in the logic to use the enum comparison or the new boolean flag and keep
mapping to items_in_current_zkp_batch as before.

In @forester/src/tree_data_sync.rs:
- Around line 318-341: In fetch_protocol_group_authority: run cargo fmt to fix
the multiline formatting around get_registered_program_pda, and add an explicit
Anchor discriminator check before calling RegisteredProgram::deserialize on
account.data[8..] (validate the first 8 bytes match
RegisteredProgram::discriminator or expected value) to avoid deserializing the
wrong account; keep the existing error handling and log the group_authority_pda
after successful validation/deserialization.

In @forester/static/dashboard.html:
- Around line 1192-1216: The refresh logic in scheduleNextRefresh can trigger
rapid polling when slots_until_next_light_slot is very small; add a minimum
interval guard and a request-in-flight guard to prevent overlapping fetches:
introduce a boolean flag (e.g., isRefreshing) used inside fetchStatus to return
early if a fetch is already running, and enforce a configurable minRefreshMs
(e.g., 1000–2000 ms) when computing refreshIn inside scheduleNextRefresh so the
timeout is never below that; ensure you clear/reset isRefreshing in fetchStatus
finally block and keep the existing capped refreshTimeout behavior.
- Around line 1063-1073: The template directly interpolates user-controlled
forester authority strings (f.authority) into innerHTML, risking XSS; add a
sanitizer helper (e.g., escapeHtml) and use it wherever user data is injected in
the template (replace ${f.authority} with ${escapeHtml(f.authority)}) while
leaving other helpers (getForesterColor, getBalanceClass, formatBalance, and the
active_epoch_foresters mapping) unchanged; ensure escapeHtml returns a safely
escaped string for insertion into the template and use it consistently for any
other interpolated fields.

Comment on lines 47 to 83
pub fn parse_batch_metadata(batches: &[light_batched_merkle_tree::batch::Batch]) -> ParsedBatchData {
use light_batched_merkle_tree::constants::DEFAULT_ZKP_BATCH_SIZE;

let mut zkp_batch_size = DEFAULT_ZKP_BATCH_SIZE;
let mut total_pending_batches = 0u64;
let mut batch_infos = Vec::with_capacity(batches.len());
let mut items_in_current_zkp_batch = 0u64;

for (batch_idx, batch) in batches.iter().enumerate() {
zkp_batch_size = batch.zkp_batch_size;
let num_inserted = batch.get_num_inserted_zkps();
let current_index = batch.get_current_zkp_batch_index();
let pending_in_batch = current_index.saturating_sub(num_inserted);

if batch.get_state() == BatchState::Fill {
items_in_current_zkp_batch = batch.get_num_inserted_zkp_batch();
}

batch_infos.push(BatchInfo {
batch_index: batch_idx,
state: format!("{:?}", batch.get_state()),
num_inserted,
current_index,
pending: pending_in_batch,
items_in_current_zkp_batch: batch.get_num_inserted_zkp_batch(),
});

total_pending_batches += pending_in_batch;
}

ParsedBatchData {
batch_infos,
total_pending_batches,
zkp_batch_size,
items_in_current_zkp_batch,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Code duplication between parse_batch_metadata and parse_state_v2_queue_info.

The logic in parse_batch_metadata (lines 47-83) is largely duplicated in parse_state_v2_queue_info (lines 97-113) for output queue batch parsing. Consider having parse_state_v2_queue_info use parse_batch_metadata internally to reduce duplication.

♻️ Suggested refactor to reduce duplication
 pub fn parse_state_v2_queue_info(
     merkle_tree: &BatchedMerkleTreeAccount,
     output_queue_data: &mut [u8],
 ) -> crate::Result<V2QueueInfo> {
     let output_queue = BatchedQueueAccount::output_from_bytes(output_queue_data)
         .map_err(|e| anyhow::anyhow!("Failed to parse StateV2 output queue: {:?}", e))?;

     let next_index = output_queue.batch_metadata.next_index;
-    let mut zkp_batch_size = DEFAULT_ZKP_BATCH_SIZE;
-    let mut output_pending_batches = 0u64;
-    let mut batch_infos = Vec::new();
-
-    for (batch_idx, batch) in output_queue.batch_metadata.batches.iter().enumerate() {
-        zkp_batch_size = batch.zkp_batch_size;
-        let num_inserted = batch.get_num_inserted_zkps();
-        let current_index = batch.get_current_zkp_batch_index();
-        let pending_in_batch = current_index.saturating_sub(num_inserted);
-
-        batch_infos.push(BatchInfo {
-            batch_index: batch_idx,
-            state: format!("{:?}", batch.get_state()),
-            num_inserted,
-            current_index,
-            pending: pending_in_batch,
-            items_in_current_zkp_batch: batch.get_num_inserted_zkp_batch(),
-        });
-
-        output_pending_batches += pending_in_batch;
-    }
+    let output_parsed = parse_batch_metadata(&output_queue.batch_metadata.batches);

     // ... rest of function using output_parsed instead of manual iteration

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @forester/src/queue_helpers.rs around lines 47 - 83, The parsing logic in
parse_batch_metadata is duplicated in parse_state_v2_queue_info; refactor by
having parse_state_v2_queue_info call parse_batch_metadata (or extract the
common loop into a new helper used by both) so BatchInfo and ParsedBatchData are
produced once; ensure you pass the same
&[light_batched_merkle_tree::batch::Batch] input (or convert the queue batches
to that shape), preserve zkp_batch_size and items_in_current_zkp_batch
semantics, and update usage sites to consume the returned ParsedBatchData
instead of repeating the loop.

Comment on lines +128 to +133
let output_items_in_current_zkp_batch = batch_infos
.iter()
.find(|b| b.state == "Fill")
.map(|b| b.items_in_current_zkp_batch)
.unwrap_or(0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

String comparison for batch state could be fragile.

The code uses string comparison b.state == "Fill" to find the fill batch. If the BatchState debug format changes, this would silently break. Consider comparing against the enum directly if possible.

🔧 Potential improvement

If you need to identify the Fill state after serialization, consider storing a boolean flag or using a more stable comparison:

// Instead of comparing strings, track during iteration:
let output_items_in_current_zkp_batch = batch_infos
    .iter()
    .enumerate()
    .find(|(idx, _)| {
        output_queue.batch_metadata.batches.get(*idx)
            .map(|b| b.get_state() == BatchState::Fill)
            .unwrap_or(false)
    })
    .map(|(_, b)| b.items_in_current_zkp_batch)
    .unwrap_or(0);

Or store the fill state as a boolean in BatchInfo.

🤖 Prompt for AI Agents
In @forester/src/queue_helpers.rs around lines 128 - 133, The code is fragile
because it matches b.state as the string "Fill"; instead compare the enum/state
directly or use a stable indicator: update the lookup that sets
output_items_in_current_zkp_batch to identify the Fill batch by inspecting the
real batch state enum (e.g., call
output_queue.batch_metadata.batches.get(index).map(|b| b.get_state() ==
BatchState::Fill) when iterating batch_infos) or add a boolean on BatchInfo
(e.g., is_fill) and use that instead of string comparison; change references to
b.state in the logic to use the enum comparison or the new boolean flag and keep
mapping to items_in_current_zkp_batch as before.

Comment on lines +1063 to +1073
${activeCount > 0 ? `
<div class="forester-list">
${data.active_epoch_foresters.map((f, idx) => `
<div class="forester-item">
<span class="forester-address">
<span class="forester-color-dot" style="background: ${getForesterColor(idx)};"></span>
${f.authority}
</span>
<span class="forester-balance ${getBalanceClass(f.balance_sol)}">${formatBalance(f.balance_sol)}</span>
</div>
`).join('')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential XSS vulnerability in template literal interpolation.

The forester authority strings are directly interpolated into HTML without escaping. If an attacker could somehow register a forester with a malicious authority string containing HTML/JS, it could execute in viewers' browsers.

While authority strings are typically base58-encoded pubkeys (safe characters), it's good practice to sanitize user-controlled data before DOM insertion.

🛡️ Suggested fix using text content escaping

Add a helper function and use it for all user-controlled data:

function escapeHtml(str) {
    if (!str) return str;
    const div = document.createElement('div');
    div.textContent = str;
    return div.innerHTML;
}

// Then use it in templates:
// Before: ${f.authority}
// After:  ${escapeHtml(f.authority)}
🤖 Prompt for AI Agents
In @forester/static/dashboard.html around lines 1063 - 1073, The template
directly interpolates user-controlled forester authority strings (f.authority)
into innerHTML, risking XSS; add a sanitizer helper (e.g., escapeHtml) and use
it wherever user data is injected in the template (replace ${f.authority} with
${escapeHtml(f.authority)}) while leaving other helpers (getForesterColor,
getBalanceClass, formatBalance, and the active_epoch_foresters mapping)
unchanged; ensure escapeHtml returns a safely escaped string for insertion into
the template and use it consistently for any other interpolated fields.

Comment on lines +1192 to +1216
// Smart refresh based on light slot timing
let refreshTimeout = null;

function scheduleNextRefresh() {
if (refreshTimeout) clearTimeout(refreshTimeout);

// If we have light slot info, refresh when slot changes
if (cachedData && cachedData.slots_until_next_light_slot) {
const slotsRemaining = cachedData.slots_until_next_light_slot;
// Convert slots to milliseconds, refresh slightly after slot change
const msUntilChange = Math.max(1000, slotsRemaining * SLOT_SECONDS * 1000 + 500);
// But cap at 10 seconds max
const refreshIn = Math.min(msUntilChange, 10000);
refreshTimeout = setTimeout(() => {
fetchStatus();
scheduleNextRefresh();
}, refreshIn);
} else {
// Fallback to 10 second refresh
refreshTimeout = setTimeout(() => {
fetchStatus();
scheduleNextRefresh();
}, 10000);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Refresh logic could cause rapid polling under certain conditions.

If slots_until_next_light_slot is very small (e.g., 1-2 slots), the refresh interval could be less than a second, potentially overwhelming the server. The current Math.max(1000, ...) provides a floor, but consider also adding a minimum interval between fetches to prevent rapid-fire requests if the server is slow to respond.

♻️ Consider adding request-in-flight guard
let isRefreshing = false;

async function fetchStatus() {
    if (isRefreshing) return;
    isRefreshing = true;
    try {
        // ... existing fetch logic ...
    } finally {
        isRefreshing = false;
    }
}
🤖 Prompt for AI Agents
In @forester/static/dashboard.html around lines 1192 - 1216, The refresh logic
in scheduleNextRefresh can trigger rapid polling when
slots_until_next_light_slot is very small; add a minimum interval guard and a
request-in-flight guard to prevent overlapping fetches: introduce a boolean flag
(e.g., isRefreshing) used inside fetchStatus to return early if a fetch is
already running, and enforce a configurable minRefreshMs (e.g., 1000–2000 ms)
when computing refreshIn inside scheduleNextRefresh so the timeout is never
below that; ensure you clear/reset isRefreshing in fetchStatus finally block and
keep the existing capped refreshTimeout behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
forester/tests/priority_fee_test.rs (1)

196-208: Test expects panic but implementation may not panic.

Line 203 uses std::panic::catch_unwind expecting get_capped_priority_fee to panic when max_fee_lamports < min_fee_lamports. However, looking at the implementation in helpers.rs, line 476 clamps max_fee_lamports to be at least min_fee_lamports:

let max_fee_lamports = cap_config.max_fee_lamports.max(cap_config.min_fee_lamports);

This means the function won't panic—it will clamp and continue. The test assertion at line 204-207 will fail because result.is_err() will be false.

🐛 Fix the test expectation
     let cap_config = CapConfig {
         rec_fee_microlamports_per_cu: 10_000,
         min_fee_lamports: 10_000,
         max_fee_lamports: 0,
         compute_unit_limit: 1_000_000,
     };
-    println!("expecting panic");
-    let result = std::panic::catch_unwind(|| get_capped_priority_fee(cap_config));
-    assert!(
-        result.is_err(),
-        "Expected panic for max fee less than min fee"
-    );
+    // Implementation clamps max_fee to min_fee when max < min, so no panic
+    let result = get_capped_priority_fee(cap_config);
+    // After clamping, max_fee_lamports becomes 10_000, and the fee should be clamped to min
+    assert_eq!(result, 10_000, "Should clamp to min when max < min");
🤖 Fix all issues with AI agents
In @forester/src/cli.rs:
- Around line 20-236: The CLI arg env names are missing the FORESTER_ prefix
causing env vars to be ignored; update the env attributes for all affected
fields in the CLI struct (e.g., rpc_url, ws_rpc_url, indexer_url, prover_url,
prover_append_url, prover_update_url, prover_address_append_url, prover_api_key,
payer, derivation, photon_api_key, photon_grpc_url, indexer_batch_size,
legacy_ixs_per_tx, processor_mode, queue_polling_mode, tree_ids,
lookup_table_address, and any other fields listed) to use the documented
FORESTER_ prefix (e.g., env = "FORESTER_RPC_URL", env = "FORESTER_PAYER", env =
"FORESTER_INDEXER_URL", etc.) so the application reads the environment variables
matching .env.example/README.
- Around line 264-274: The change made rpc_url in the StatusArgs struct to a
required String which breaks callers; revert rpc_url to Option<String> (i.e.,
the original optional type) in forester/src/cli.rs and keep the #[arg(..., env =
"RPC_URL", value_name = "RPC_URL")] attribute so the env var remains supported;
then update any call sites that previously assumed a String (references to
StatusArgs::rpc_url) to handle the Option (e.g., propagate the Option, provide a
clear error where required, or supply a default) so runtime behavior is
unchanged for existing scripts.

In @forester/src/main.rs:
- Around line 113-118: The API server handle returned from spawn_api_server
(ApiServerHandle) is created but not used, so the API server never receives the
graceful shutdown signal; modify the shutdown flow so spawn_shutdown_handler (or
the Ctrl+C handler that sends to service_sender) also invokes the API server
shutdown path — e.g., accept or capture the ApiServerHandle returned from
spawn_api_server and call its shutdown() method or send on its shutdown_tx when
the shutdown signal is triggered, ensuring the API server is signaled to stop
before/alongside the main service shutdown.

In @forester/src/metrics.rs:
- Around line 211-232: Both update_indexer_response_time and
update_indexer_proof_count call register_metrics() on every invocation which
adds unnecessary overhead; remove the register_metrics() calls from these
functions (i.e., delete the register_metrics() line in
update_indexer_response_time and the one in update_indexer_proof_count) and
ensure register_metrics() is invoked once during startup/initialization
(document this requirement near the module or in the init path) so metrics are
registered before any calls to update_indexer_response_time or
update_indexer_proof_count.

In @forester/src/processor/v1/helpers.rs:
- Around line 209-258: The state proofs path currently calls
rpc.indexer().get_multiple_compressed_account_proofs once (inside the
state_proofs block) without retries; add the same retry/backoff logic used for
address proofs to this call so transient network errors are retried before
returning Err. Wrap
rpc.indexer()?.get_multiple_compressed_account_proofs(states, None).await in the
existing retry mechanism (same max attempts, backoff, and logging), ensure
update_indexer_response_time and update_indexer_proof_count are still called on
success, emit per-attempt warnings on failure, and only return Err after retries
are exhausted to keep behavior consistent with the address proofs code path.

In @forester/src/processor/v2/processor.rs:
- Around line 29-31: The comment is incorrect: MAX_BATCHES_PER_TREE is a
hardcoded optimization constant, not stored in BatchContext or configurable via
CLI; update the doc comment above const MAX_BATCHES_PER_TREE to state it is a
compile-time limit (value 4) used to bound per-tree queue size and processing
concurrency (keeps queue_size = zkp_batch_size * MAX_BATCHES_PER_TREE and limits
memory/throughput), and include the rationale that the value was chosen to limit
resource usage and was tuned empirically. If you instead want runtime
configurability, add a max_batches_per_tree field to BatchContext, add a CLI
flag to populate it, replace uses of the const with that field (and remove the
const or provide a default), and propagate the value into all places that
currently reference MAX_BATCHES_PER_TREE (functions/methods that compute
queue_size, fetch batches, and warmup logic).

In @forester/src/tree_data_sync.rs:
- Around line 319-344: In fetch_protocol_group_authority, validate the account
discriminator before deserializing RegisteredProgram: after obtaining account
(variable account) call check_discriminator::<RegisteredProgram>(&account.data)?
and only then call RegisteredProgram::deserialize(&mut &account.data[8..]); this
mirrors other account handlers
(StateMerkleTreeAccount/AddressMerkleTreeAccount/BatchedMerkleTreeAccount) and
prevents deserializing the wrong account type.

Comment on lines +63 to +84
pub fn spawn_api_server(rpc_url: String, port: u16, allow_public_bind: bool) -> ApiServerHandle {
let (shutdown_tx, shutdown_rx) = oneshot::channel::<()>();

let thread_handle = std::thread::spawn(move || {
let rt = match tokio::runtime::Runtime::new() {
Ok(rt) => rt,
Err(e) => {
error!("Failed to create tokio runtime for API server: {}", e);
return;
}
};
rt.block_on(async move {
let addr = if allow_public_bind {
warn!(
"API server binding to 0.0.0.0:{} - endpoints /status and /metrics/json will be publicly accessible",
port
);
SocketAddr::from(([0, 0, 0, 0], port))
} else {
SocketAddr::from(([127, 0, 0, 1], port))
};
info!("Starting HTTP API server on {}", addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Good security awareness, but consider rate limiting for public endpoints.

The API server correctly:

  • Defaults to localhost binding (secure)
  • Warns when binding publicly (lines 76-79)
  • Uses a separate thread/runtime to avoid blocking

However, when allow_public_bind is true, the /status and /metrics/json endpoints become publicly accessible without rate limiting or authentication. Consider:

  1. Information disclosure: /status exposes forester state, which could reveal operational details to attackers
  2. DoS risk: /status has a 30-second timeout and calls get_forester_status, which could be expensive
  3. Metrics exposure: Queue lengths and transaction rates might leak operational information

Recommendations:

  • Document the security implications of public binding in user documentation
  • Consider adding basic authentication or API key support for public deployments
  • Add rate limiting to prevent DoS attacks
  • Consider whether /status should have a shorter timeout when publicly accessible

Do you want me to help design a rate-limiting or authentication mechanism for the public API endpoints?

Comment on lines +96 to +135
let status_route = warp::path("status").and(warp::get()).and_then(move || {
let rpc_url = rpc_url.clone();
async move {
let timeout_duration = Duration::from_secs(STATUS_TIMEOUT_SECS);
match tokio::time::timeout(timeout_duration, get_forester_status(&rpc_url))
.await
{
Ok(Ok(status)) => Ok::<_, warp::Rejection>(warp::reply::with_status(
warp::reply::json(&status),
StatusCode::OK,
)),
Ok(Err(e)) => {
error!("Failed to get forester status: {:?}", e);
let error_response = ErrorResponse {
error: format!("Failed to get forester status: {}", e),
};
Ok(warp::reply::with_status(
warp::reply::json(&error_response),
StatusCode::INTERNAL_SERVER_ERROR,
))
}
Err(_elapsed) => {
error!(
"Forester status request timed out after {}s",
STATUS_TIMEOUT_SECS
);
let error_response = ErrorResponse {
error: format!(
"Request timed out after {} seconds",
STATUS_TIMEOUT_SECS
),
};
Ok(warp::reply::with_status(
warp::reply::json(&error_response),
StatusCode::GATEWAY_TIMEOUT,
))
}
}
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Status endpoint timeout might be too long for HTTP API.

The /status endpoint uses a 30-second timeout (line 99), which is quite long for an HTTP endpoint:

  • Most HTTP clients/proxies have default timeouts of 30-60 seconds
  • This leaves little margin for error if the status check takes close to 30s
  • Users might perceive the endpoint as unresponsive

Consider:

  1. Reducing the timeout to 10-15 seconds for better user experience
  2. Documenting that status checks can take significant time
  3. Adding a query parameter to allow callers to specify timeout (with a maximum cap)

The error handling is otherwise solid, with appropriate status codes (504 for timeout, 500 for failures).

Would you like me to verify the typical execution time of get_forester_status to determine an appropriate timeout?

Comment on lines +20 to 236
#[arg(long, env = "RPC_URL")]
pub rpc_url: Option<String>,

#[arg(long, env = "FORESTER_PUSH_GATEWAY_URL")]
#[arg(long, env = "PUSH_GATEWAY_URL")]
pub push_gateway_url: Option<String>,

#[arg(long, env = "FORESTER_PAGERDUTY_ROUTING_KEY")]
#[arg(long, env = "PAGERDUTY_ROUTING_KEY")]
pub pagerduty_routing_key: Option<String>,

#[arg(long, env = "FORESTER_WS_RPC_URL")]
#[arg(long, env = "WS_RPC_URL")]
pub ws_rpc_url: Option<String>,

#[arg(long, env = "FORESTER_INDEXER_URL")]
#[arg(long, env = "INDEXER_URL")]
pub indexer_url: Option<String>,

#[arg(long, env = "FORESTER_PROVER_URL")]
#[arg(long, env = "PROVER_URL")]
pub prover_url: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_APPEND_URL",
env = "PROVER_APPEND_URL",
help = "Prover URL for append operations. If not specified, uses prover_url"
)]
pub prover_append_url: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_UPDATE_URL",
env = "PROVER_UPDATE_URL",
help = "Prover URL for update operations. If not specified, uses prover_url"
)]
pub prover_update_url: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_ADDRESS_APPEND_URL",
env = "PROVER_ADDRESS_APPEND_URL",
help = "Prover URL for address-append operations. If not specified, uses prover_url"
)]
pub prover_address_append_url: Option<String>,

#[arg(long, env = "FORESTER_PROVER_API_KEY")]
#[arg(long, env = "PROVER_API_KEY")]
pub prover_api_key: Option<String>,

#[arg(
long,
env = "FORESTER_PROVER_POLLING_INTERVAL_MS",
env = "PROVER_POLLING_INTERVAL_MS",
help = "Prover polling interval in milliseconds (default: 1000)"
)]
pub prover_polling_interval_ms: Option<u64>,

#[arg(
long,
env = "FORESTER_PROVER_MAX_WAIT_TIME_SECS",
env = "PROVER_MAX_WAIT_TIME_SECS",
help = "Maximum time to wait for prover response in seconds (default: 600)"
)]
pub prover_max_wait_time_secs: Option<u64>,

#[arg(long, env = "FORESTER_PAYER")]
#[arg(long, env = "PAYER")]
pub payer: Option<String>,

#[arg(long, env = "FORESTER_DERIVATION_PUBKEY")]
#[arg(long, env = "DERIVATION_PUBKEY")]
pub derivation: Option<String>,

#[arg(long, env = "FORESTER_PHOTON_API_KEY")]
#[arg(long, env = "PHOTON_API_KEY")]
pub photon_api_key: Option<String>,

#[arg(long, env = "FORESTER_PHOTON_GRPC_URL")]
#[arg(long, env = "PHOTON_GRPC_URL")]
pub photon_grpc_url: Option<String>,

#[arg(long, env = "FORESTER_INDEXER_BATCH_SIZE", default_value = "50")]
#[arg(long, env = "INDEXER_BATCH_SIZE", default_value = "50")]
pub indexer_batch_size: usize,

#[arg(
long,
env = "FORESTER_INDEXER_MAX_CONCURRENT_BATCHES",
default_value = "10"
)]
#[arg(long, env = "INDEXER_MAX_CONCURRENT_BATCHES", default_value = "10")]
pub indexer_max_concurrent_batches: usize,

#[arg(long, env = "FORESTER_LEGACY_XS_PER_TX", default_value = "1")]
#[arg(long, env = "LEGACY_XS_PER_TX", default_value = "1")]
pub legacy_ixs_per_tx: usize,
#[arg(
long,
env = "FORESTER_TRANSACTION_MAX_CONCURRENT_BATCHES",
default_value = "20"
)]
#[arg(long, env = "TRANSACTION_MAX_CONCURRENT_BATCHES", default_value = "20")]
pub transaction_max_concurrent_batches: usize,

#[arg(
long,
env = "FORESTER_MAX_CONCURRENT_SENDS",
env = "MAX_CONCURRENT_SENDS",
default_value = "50",
help = "Maximum number of concurrent transaction sends per batch"
)]
pub max_concurrent_sends: usize,

#[arg(
long,
env = "FORESTER_TX_CACHE_TTL_SECONDS",
env = "TX_CACHE_TTL_SECONDS",
default_value = "180",
help = "TTL in seconds to prevent duplicate transaction processing"
)]
pub tx_cache_ttl_seconds: u64,

#[arg(
long,
env = "FORESTER_OPS_CACHE_TTL_SECONDS",
env = "OPS_CACHE_TTL_SECONDS",
default_value = "180",
help = "TTL in seconds to prevent duplicate batch operations processing"
)]
pub ops_cache_ttl_seconds: u64,

#[arg(
long,
env = "FORESTER_CONFIRMATION_MAX_ATTEMPTS",
env = "CONFIRMATION_MAX_ATTEMPTS",
default_value = "60",
help = "Maximum attempts to confirm a transaction before timing out"
)]
pub confirmation_max_attempts: u32,

#[arg(
long,
env = "FORESTER_CONFIRMATION_POLL_INTERVAL_MS",
env = "CONFIRMATION_POLL_INTERVAL_MS",
default_value = "500",
help = "Interval between confirmation polling attempts in milliseconds"
)]
pub confirmation_poll_interval_ms: u64,

#[arg(long, env = "FORESTER_CU_LIMIT", default_value = "1000000")]
#[arg(long, env = "CU_LIMIT", default_value = "1000000")]
pub cu_limit: u32,

#[arg(long, env = "FORESTER_ENABLE_PRIORITY_FEES", default_value = "false")]
#[arg(long, env = "ENABLE_PRIORITY_FEES", default_value = "false")]
pub enable_priority_fees: bool,

#[arg(long, env = "FORESTER_RPC_POOL_SIZE", default_value = "100")]
#[arg(long, env = "RPC_POOL_SIZE", default_value = "100")]
pub rpc_pool_size: u32,

#[arg(
long,
env = "FORESTER_RPC_POOL_CONNECTION_TIMEOUT_SECS",
default_value = "15"
)]
#[arg(long, env = "RPC_POOL_CONNECTION_TIMEOUT_SECS", default_value = "15")]
pub rpc_pool_connection_timeout_secs: u64,

#[arg(
long,
env = "FORESTER_RPC_POOL_IDLE_TIMEOUT_SECS",
default_value = "300"
)]
#[arg(long, env = "RPC_POOL_IDLE_TIMEOUT_SECS", default_value = "300")]
pub rpc_pool_idle_timeout_secs: u64,

#[arg(long, env = "FORESTER_RPC_POOL_MAX_RETRIES", default_value = "100")]
#[arg(long, env = "RPC_POOL_MAX_RETRIES", default_value = "100")]
pub rpc_pool_max_retries: u32,

#[arg(
long,
env = "FORESTER_RPC_POOL_INITIAL_RETRY_DELAY_MS",
default_value = "1000"
)]
#[arg(long, env = "RPC_POOL_INITIAL_RETRY_DELAY_MS", default_value = "1000")]
pub rpc_pool_initial_retry_delay_ms: u64,

#[arg(
long,
env = "FORESTER_RPC_POOL_MAX_RETRY_DELAY_MS",
default_value = "16000"
)]
#[arg(long, env = "RPC_POOL_MAX_RETRY_DELAY_MS", default_value = "16000")]
pub rpc_pool_max_retry_delay_ms: u64,

#[arg(
long,
env = "FORESTER_SLOT_UPDATE_INTERVAL_SECONDS",
default_value = "10"
)]
#[arg(long, env = "SLOT_UPDATE_INTERVAL_SECONDS", default_value = "10")]
pub slot_update_interval_seconds: u64,

#[arg(
long,
env = "FORESTER_TREE_DISCOVERY_INTERVAL_SECONDS",
default_value = "5"
)]
#[arg(long, env = "TREE_DISCOVERY_INTERVAL_SECONDS", default_value = "5")]
pub tree_discovery_interval_seconds: u64,

#[arg(long, env = "FORESTER_MAX_RETRIES", default_value = "3")]
#[arg(long, env = "MAX_RETRIES", default_value = "3")]
pub max_retries: u32,

#[arg(long, env = "FORESTER_RETRY_DELAY", default_value = "1000")]
#[arg(long, env = "RETRY_DELAY", default_value = "1000")]
pub retry_delay: u64,

#[arg(long, env = "FORESTER_RETRY_TIMEOUT", default_value = "30000")]
#[arg(long, env = "RETRY_TIMEOUT", default_value = "30000")]
pub retry_timeout: u64,

#[arg(long, env = "FORESTER_STATE_QUEUE_START_INDEX", default_value = "0")]
#[arg(long, env = "STATE_QUEUE_START_INDEX", default_value = "0")]
pub state_queue_start_index: u16,

#[arg(
long,
env = "FORESTER_STATE_PROCESSING_LENGTH",
default_value = "28807"
)]
#[arg(long, env = "STATE_PROCESSING_LENGTH", default_value = "28807")]
pub state_queue_processing_length: u16,

#[arg(long, env = "FORESTER_ADDRESS_QUEUE_START_INDEX", default_value = "0")]
#[arg(long, env = "ADDRESS_QUEUE_START_INDEX", default_value = "0")]
pub address_queue_start_index: u16,

#[arg(
long,
env = "FORESTER_ADDRESS_PROCESSING_LENGTH",
default_value = "28807"
)]
#[arg(long, env = "ADDRESS_PROCESSING_LENGTH", default_value = "28807")]
pub address_queue_processing_length: u16,

#[arg(long, env = "FORESTER_RPC_RATE_LIMIT")]
#[arg(long, env = "RPC_RATE_LIMIT")]
pub rpc_rate_limit: Option<u32>,

#[arg(long, env = "FORESTER_PHOTON_RATE_LIMIT")]
#[arg(long, env = "PHOTON_RATE_LIMIT")]
pub photon_rate_limit: Option<u32>,

#[arg(long, env = "FORESTER_SEND_TRANSACTION_RATE_LIMIT")]
#[arg(long, env = "SEND_TRANSACTION_RATE_LIMIT")]
pub send_tx_rate_limit: Option<u32>,

#[arg(
long,
env = "FORESTER_PROCESSOR_MODE",
env = "PROCESSOR_MODE",
default_value_t = ProcessorMode::All,
help = "Processor mode: v1 (process only v1 trees), v2 (process only v2 trees), all (process all trees)"
)]
pub processor_mode: ProcessorMode,

#[arg(
long,
env = "FORESTER_QUEUE_POLLING_MODE",
env = "QUEUE_POLLING_MODE",
default_value_t = QueuePollingMode::Indexer,
help = "Queue polling mode: indexer (poll indexer API, requires indexer_url), onchain (read queue status directly from RPC)"
)]
pub queue_polling_mode: QueuePollingMode,

#[arg(
long = "tree-id",
env = "FORESTER_TREE_IDS",
env = "TREE_IDS",
help = "Process only the specified trees (Pubkeys). Can be specified multiple times. If specified, forester will process only these trees and ignore all others",
value_delimiter = ','
)]
pub tree_ids: Vec<String>,

#[arg(
long,
env = "FORESTER_ENABLE_COMPRESSIBLE",
env = "ENABLE_COMPRESSIBLE",
help = "Enable compressible account tracking and compression using ws_rpc_url (requires --ws-rpc-url)",
default_value = "false"
)]
pub enable_compressible: bool,

#[arg(
long,
env = "FORESTER_LOOKUP_TABLE_ADDRESS",
env = "LOOKUP_TABLE_ADDRESS",
help = "Address lookup table pubkey for versioned transactions. If not provided, legacy transactions will be used."
)]
pub lookup_table_address: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's check the current state of the CLI file to see env variable patterns
rg "env = \"(FORESTER_|[A-Z_]+)\"" forester/src/cli.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1001


🏁 Script executed:

#!/bin/bash
# Check if there's a .env.example file and what it contains
find forester -name ".env.example" -o -name ".env*" | head -10

Repository: Lightprotocol/light-protocol

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Look for environment variable documentation in README files
cat forester/README.md 2>/dev/null | head -100

Repository: Lightprotocol/light-protocol

Length of output: 3124


🏁 Script executed:

#!/bin/bash
# Search for any migration guides or breaking change documentation
rg -i "breaking.*change|migration|FORESTER_" forester/README.md

Repository: Lightprotocol/light-protocol

Length of output: 1402


🏁 Script executed:

#!/bin/bash
# Check what format .env.example uses
cat forester/.env.example 2>/dev/null || echo "File not found"

Repository: Lightprotocol/light-protocol

Length of output: 3003


🏁 Script executed:

#!/bin/bash
# Also check if there are any recent git changes to understand the intention
git log --oneline -n 5 -- forester/src/cli.rs 2>/dev/null || echo "Git log not available"

Repository: Lightprotocol/light-protocol

Length of output: 125


Environment variable prefix mismatch breaks configuration loading.

The code expects environment variables without the FORESTER_ prefix (e.g., RPC_URL, PAYER, INDEXER_URL), but the .env.example file and README still document and export them with the FORESTER_ prefix (e.g., FORESTER_RPC_URL, FORESTER_PAYER, FORESTER_INDEXER_URL).

This is a critical silent failure: any deployment that copies .env.example and sources it will have all environment variables ignored, causing the application to fall back to defaults. The code needs to be updated to respect the FORESTER_ prefix convention, which is documented in the README and reflected in the configuration examples. Update the env attributes in forester/src/cli.rs (lines 20–236) to match: env = "FORESTER_RPC_URL", env = "FORESTER_PAYER", etc.

🤖 Prompt for AI Agents
In @forester/src/cli.rs around lines 20 - 236, The CLI arg env names are missing
the FORESTER_ prefix causing env vars to be ignored; update the env attributes
for all affected fields in the CLI struct (e.g., rpc_url, ws_rpc_url,
indexer_url, prover_url, prover_append_url, prover_update_url,
prover_address_append_url, prover_api_key, payer, derivation, photon_api_key,
photon_grpc_url, indexer_batch_size, legacy_ixs_per_tx, processor_mode,
queue_polling_mode, tree_ids, lookup_table_address, and any other fields listed)
to use the documented FORESTER_ prefix (e.g., env = "FORESTER_RPC_URL", env =
"FORESTER_PAYER", env = "FORESTER_INDEXER_URL", etc.) so the application reads
the environment variables matching .env.example/README.

Comment on lines +264 to 274
#[arg(long, env = "RPC_URL", value_name = "RPC_URL", alias = "RPC_URL")]
pub rpc_url: String,

#[arg(long, env = "FORESTER_PUSH_GATEWAY_URL")]
#[arg(long, env = "PUSH_GATEWAY_URL", value_name = "PUSH_GATEWAY_URL")]
pub push_gateway_url: Option<String>,
#[arg(long, env = "FORESTER_PAGERDUTY_ROUTING_KEY")]
#[arg(
long,
env = "PAGERDUTY_ROUTING_KEY",
value_name = "PAGERDUTY_ROUTING_KEY"
)]
pub pagerduty_routing_key: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check the file structure and find the exact lines
wc -l forester/src/cli.rs

Repository: Lightprotocol/light-protocol

Length of output: 95


🏁 Script executed:

# Look at the code around lines 264-274
sed -n '255,280p' forester/src/cli.rs | cat -n

Repository: Lightprotocol/light-protocol

Length of output: 1075


🏁 Script executed:

# Check git history to see what changed
cd forester && git log --oneline -20 -- src/cli.rs

Repository: Lightprotocol/light-protocol

Length of output: 125


🏁 Script executed:

# Look for any changelog or documentation
find . -name "CHANGELOG*" -o -name "CLAUDE.md" -o -name "HISTORY*" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 641


🏁 Script executed:

# Search for mentions of rpc_url and FORESTER_RPC_URL to understand the change
rg "FORESTER_RPC_URL|rpc_url" forester/src/ -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 20241


🏁 Script executed:

# Check git history for changes to the RPC_URL env variable
cd forester && git log -p --all -S "FORESTER_RPC_URL" -- src/cli.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Check CLAUDE.md for any documentation about this change
cat ./CLAUDE.md

Repository: Lightprotocol/light-protocol

Length of output: 10409


🏁 Script executed:

# Also check if there's a recent commit that might have changed the env var
cd forester && git log --all --grep="rpc" --grep="env" --oneline | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1286


🏁 Script executed:

# Look at the recent commit that mentioned graceful shutdown
cd forester && git show 75870d65 --stat

Repository: Lightprotocol/light-protocol

Length of output: 50385


🏁 Script executed:

# Search the entire repository history for FORESTER_RPC_URL, including branches
git log -p --all -S "FORESTER_RPC_URL" | head -50

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

# Also check if maybe it was in a different naming convention
rg -i "forester.*rpc.*url" forester/src/

Repository: Lightprotocol/light-protocol

Length of output: 535


Breaking change: StatusArgs.rpc_url now required.

The rpc_url field in StatusArgs changed from Option<String> to String, making it a required argument. This is a breaking change for any scripts or automation that relied on the field being optional.

The alias = "RPC_URL" attribute is a CLI flag alias and does not provide backward compatibility for the environment variable.

🤖 Prompt for AI Agents
In @forester/src/cli.rs around lines 264 - 274, The change made rpc_url in the
StatusArgs struct to a required String which breaks callers; revert rpc_url to
Option<String> (i.e., the original optional type) in forester/src/cli.rs and
keep the #[arg(..., env = "RPC_URL", value_name = "RPC_URL")] attribute so the
env var remains supported; then update any call sites that previously assumed a
String (references to StatusArgs::rpc_url) to handle the Option (e.g., propagate
the Option, provide a clear error where required, or supply a default) so
runtime behavior is unchanged for existing scripts.

Comment on lines +113 to +118
let rpc_url_for_api: String = config.external_services.rpc_url.to_string();
let _api_server_handle = spawn_api_server(
rpc_url_for_api,
args.api_server_port,
args.api_server_public_bind,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

API server handle not used for shutdown coordination.

The _api_server_handle is captured but never used. Looking at ApiServerHandle in the relevant snippet, it contains a shutdown_tx for graceful shutdown.

When the forester receives Ctrl+C, spawn_shutdown_handler sends to service_sender but doesn't signal the API server. The API server will only stop when the main process exits.

Consider adding API server shutdown to the shutdown handler:

♻️ Suggested fix to coordinate API server shutdown
-            let _api_server_handle = spawn_api_server(
+            let api_server_handle = spawn_api_server(
                 rpc_url_for_api,
                 args.api_server_port,
                 args.api_server_public_bind,
             );

Then modify the shutdown handler to call api_server_handle.shutdown() or similar.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In @forester/src/main.rs around lines 113 - 118, The API server handle returned
from spawn_api_server (ApiServerHandle) is created but not used, so the API
server never receives the graceful shutdown signal; modify the shutdown flow so
spawn_shutdown_handler (or the Ctrl+C handler that sends to service_sender) also
invokes the API server shutdown path — e.g., accept or capture the
ApiServerHandle returned from spawn_api_server and call its shutdown() method or
send on its shutdown_tx when the shutdown signal is triggered, ensuring the API
server is signaled to stop before/alongside the main service shutdown.

Comment on lines +211 to +232
pub fn update_indexer_response_time(operation: &str, tree_type: &str, duration_secs: f64) {
// Ensure metrics are registered before updating (idempotent via Once)
register_metrics();
INDEXER_RESPONSE_TIME
.with_label_values(&[operation, tree_type])
.set(duration_secs);
debug!(
"Indexer {} for {} took {:.3}s",
operation, tree_type, duration_secs
);
}

pub fn update_indexer_proof_count(tree_type: &str, requested: i64, received: i64) {
// Ensure metrics are registered before updating (idempotent via Once)
register_metrics();
INDEXER_PROOF_COUNT
.with_label_values(&[tree_type, "requested"])
.set(requested);
INDEXER_PROOF_COUNT
.with_label_values(&[tree_type, "received"])
.set(received);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider performance impact of redundant registration calls.

Both update functions call register_metrics() on every invocation. While this is safe due to the Once guard, it adds a small overhead (atomic check + function call) on the hot path.

If these metrics are updated frequently (e.g., on every indexer request), consider:

  1. Documenting that callers should ensure register_metrics() is called during initialization
  2. Removing the redundant calls from these update functions

However, if the frequency is low or the convenience outweighs the minimal overhead, the current approach is acceptable.

🤖 Prompt for AI Agents
In @forester/src/metrics.rs around lines 211 - 232, Both
update_indexer_response_time and update_indexer_proof_count call
register_metrics() on every invocation which adds unnecessary overhead; remove
the register_metrics() calls from these functions (i.e., delete the
register_metrics() line in update_indexer_response_time and the one in
update_indexer_proof_count) and ensure register_metrics() is invoked once during
startup/initialization (document this requirement near the module or in the init
path) so metrics are registered before any calls to update_indexer_response_time
or update_indexer_proof_count.

Comment on lines +209 to 258
let state_proofs = if let Some(states) = state_data {
let total_states = states.len();
info!("Fetching {} state proofs", total_states);

let start_time = Instant::now();
match rpc
.indexer()?
.get_multiple_compressed_account_proofs(states, None)
.await
{
Ok(response) => {
let duration = start_time.elapsed();
let proofs_received = response.value.items.len();

info!(
"State proofs complete: requested={}, received={}, duration={:.3}s",
total_states,
proofs_received,
duration.as_secs_f64()
);

if proofs_received != total_states {
warn!(
"State proof count mismatch: requested={}, received={}",
total_states, proofs_received
);
}

update_indexer_response_time(
"get_multiple_compressed_account_proofs",
"StateV1",
duration.as_secs_f64(),
);
update_indexer_proof_count("StateV1", total_states as i64, proofs_received as i64);

response.value.items
}
Err(e) => {
let duration = start_time.elapsed();
warn!(
"Failed to get state proofs after {:.3}s: {}",
duration.as_secs_f64(),
e
);
return Err(anyhow::anyhow!("Failed to get state proofs: {}", e));
}
}
} else {
Vec::new()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

State proofs path lacks retry logic—intentional asymmetry?

Address proofs have retry logic, but state proofs use a single call without retries. If transient network errors affect both paths equally, state proofs could fail where address proofs would recover.

Consider whether state proofs should also have retry logic for consistency, or document why the asymmetry is acceptable (e.g., state proofs are typically smaller/faster/more reliable).

🤖 Prompt for AI Agents
In @forester/src/processor/v1/helpers.rs around lines 209 - 258, The state
proofs path currently calls rpc.indexer().get_multiple_compressed_account_proofs
once (inside the state_proofs block) without retries; add the same retry/backoff
logic used for address proofs to this call so transient network errors are
retried before returning Err. Wrap
rpc.indexer()?.get_multiple_compressed_account_proofs(states, None).await in the
existing retry mechanism (same max attempts, backoff, and logging), ensure
update_indexer_response_time and update_indexer_proof_count are still called on
success, emit per-attempt warnings on failure, and only return Err after retries
are exhausted to keep behavior consistent with the address proofs code path.

Comment on lines +29 to +31
/// Maximum batches to process per tree per iteration.
/// This is stored in BatchContext and configured via CLI.
const MAX_BATCHES_PER_TREE: usize = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, check the actual constant definition in processor.rs
cat -n forester/src/processor/v2/processor.rs | head -40 | tail -15

Repository: Lightprotocol/light-protocol

Length of output: 600


🏁 Script executed:

#!/bin/bash
# Search for BatchContext definition
rg -n "struct BatchContext" --type rust -A 10

Repository: Lightprotocol/light-protocol

Length of output: 883


🏁 Script executed:

#!/bin/bash
# Check for CLI configuration related to batch sizes
rg -n "max_batches|batch.*size" forester/ --type rust | head -30

Repository: Lightprotocol/light-protocol

Length of output: 3204


🏁 Script executed:

#!/bin/bash
# Look for historical context or comments explaining the choice of 4
rg -n "MAX_BATCHES_PER_TREE" forester/ --type rust -B 2 -A 2

Repository: Lightprotocol/light-protocol

Length of output: 4172


🏁 Script executed:

#!/bin/bash
# Check docs and CLAUDE.md for batch size guidance
find . -name "CLAUDE.md" -o -name "*batched*merkle*" -type f | head -20

Repository: Lightprotocol/light-protocol

Length of output: 927


🏁 Script executed:

#!/bin/bash
# Check the batched-merkle-tree docs directory
ls -la program-libs/batched-merkle-tree/docs/

Repository: Lightprotocol/light-protocol

Length of output: 905


🏁 Script executed:

#!/bin/bash
# Read the main CLAUDE.md file from batched-merkle-tree docs
cat program-libs/batched-merkle-tree/docs/CLAUDE.md

Repository: Lightprotocol/light-protocol

Length of output: 9434


🏁 Script executed:

#!/bin/bash
# Check for any comments or documentation in processor.rs about why 4 was chosen
rg -n "4|batches|iterations" forester/src/processor/v2/processor.rs | grep -i "comment\|//\|max\|batch" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1288


🏁 Script executed:

#!/bin/bash
# Look for git history or version control info about the change from 20 to 4
git log --oneline --all -n 50 | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1339


🏁 Script executed:

#!/bin/bash
# Check the full processor.rs file for any related configuration or comments
head -100 forester/src/processor/v2/processor.rs

Repository: Lightprotocol/light-protocol

Length of output: 3318


Fix the misleading documentation—this constant is not configurable and doesn't live in BatchContext.

The comment claims the constant is "stored in BatchContext and configured via CLI," but neither is true:

  1. BatchContext doesn't contain max_batches_per_tree — The struct (in forester/src/processor/v2/common.rs) has no such field.
  2. No CLI configuration exists — This is a hardcoded const, not a runtime parameter.

The constant limits batch processing per iteration across 9 locations (lines 119, 154, 173, 175, 202, 212, 233, 508, 537), which affects:

  • Queue processing throughput at line 119 (queue_size = zkp_batch_size * MAX_BATCHES_PER_TREE)
  • Cached state processing at line 154
  • Fetch batches calculation at lines 173, 175
  • Batch processing decisions at lines 202, 212, 233
  • Warmup operations at lines 508, 537

Update the comment to either:

  • Accurately document this as a hardcoded optimization constant with the rationale for the limit, or
  • If runtime configurability is intended, implement it by reading from BatchContext or CLI and update the struct accordingly.

What's the reason for this particular limit value?

🤖 Prompt for AI Agents
In @forester/src/processor/v2/processor.rs around lines 29 - 31, The comment is
incorrect: MAX_BATCHES_PER_TREE is a hardcoded optimization constant, not stored
in BatchContext or configurable via CLI; update the doc comment above const
MAX_BATCHES_PER_TREE to state it is a compile-time limit (value 4) used to bound
per-tree queue size and processing concurrency (keeps queue_size =
zkp_batch_size * MAX_BATCHES_PER_TREE and limits memory/throughput), and include
the rationale that the value was chosen to limit resource usage and was tuned
empirically. If you instead want runtime configurability, add a
max_batches_per_tree field to BatchContext, add a CLI flag to populate it,
replace uses of the const with that field (and remove the const or provide a
default), and propagate the value into all places that currently reference
MAX_BATCHES_PER_TREE (functions/methods that compute queue_size, fetch batches,
and warmup logic).

Comment on lines +319 to +344
pub async fn fetch_protocol_group_authority<R: Rpc>(rpc: &R) -> Result<Pubkey> {
let registered_program_pda =
light_registry::account_compression_cpi::sdk::get_registered_program_pda(
&light_registry::ID,
);

let account = rpc
.get_account(registered_program_pda)
.await?
.ok_or_else(|| {
anyhow::anyhow!(
"RegisteredProgram PDA not found for light_registry at {}",
registered_program_pda
)
})?;

let registered_program = RegisteredProgram::deserialize(&mut &account.data[8..])
.map_err(|e| anyhow::anyhow!("Failed to deserialize RegisteredProgram: {}", e))?;

info!(
"Fetched protocol group authority: {}",
registered_program.group_authority_pda
);

Ok(registered_program.group_authority_pda)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find RegisteredProgram definition and usage patterns
rg -n "RegisteredProgram" --type rust forester/src/tree_data_sync.rs -B 2 -A 5 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 935


🏁 Script executed:

#!/bin/bash
# Check the light_registry module for RegisteredProgram and discriminator handling
rg -n "struct RegisteredProgram\|impl RegisteredProgram" --type rust | head -40

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Look for discriminator validation patterns in the codebase
rg -n "check_discriminator\|check_account_info" --type rust | head -40

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Examine how other code deserializes accounts with discriminators
rg -n "deserialize.*8\.\." --type rust | head -40

Repository: Lightprotocol/light-protocol

Length of output: 1708


🏁 Script executed:

#!/bin/bash
# Find RegisteredProgram definition in account_compression crate
rg -n "struct RegisteredProgram" --type rust | head -20

Repository: Lightprotocol/light-protocol

Length of output: 290


🏁 Script executed:

#!/bin/bash
# Check how check_discriminator is used in the codebase
rg -n "check_discriminator" --type rust -A 3 | head -60

Repository: Lightprotocol/light-protocol

Length of output: 4790


🏁 Script executed:

#!/bin/bash
# Look for usage of check_discriminator in forester specifically
rg -n "check_discriminator" forester/ --type rust -A 3

Repository: Lightprotocol/light-protocol

Length of output: 1813


🏁 Script executed:

#!/bin/bash
# Check RegisteredProgram definition to see if it implements needed traits
rg -n "struct RegisteredProgram" --type rust -A 10 programs/account-compression/src/instructions/register_program.rs

Repository: Lightprotocol/light-protocol

Length of output: 564


🏁 Script executed:

#!/bin/bash
# Check if RegisteredProgram is derived with necessary traits for check_discriminator
rg -n "RegisteredProgram" programs/account-compression/src/instructions/register_program.rs -B 5 | head -30

Repository: Lightprotocol/light-protocol

Length of output: 488


🏁 Script executed:

#!/bin/bash
# Find where RegisteredProgram is serialized/stored to understand its discriminator
rg -n "DISCRIMINATOR\|ANCHOR_DISCRIMINATOR" programs/account-compression/src/ --type rust -B 2 -A 2 | grep -A 5 -B 5 RegisteredProgram | head -40

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Check account_compression mod.rs to see what RegisteredProgram exports
rg -n "pub use.*RegisteredProgram\|RegisteredProgram" programs/account-compression/src/lib.rs -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Check Anchor account implementations to see if RegisteredProgram has DISCRIMINATOR constant
rg -n "impl.*Owner.*RegisteredProgram\|DISCRIMINATOR" programs/account-compression/src/ --type rust | head -40

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Look at how Anchor's #[account] macro provides DISCRIMINATOR
rg -n "const DISCRIMINATOR\|DISCRIMINATOR.*=" programs/account-compression/src/ --type rust -B 2 -A 2 | head -60

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Check if RegisteredProgram can actually be used with check_discriminator by looking at similar usage
rg -n "check_discriminator.*RegisteredProgram\|RegisteredProgram.*check_discriminator" --type rust

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Check what the Anchor #[account] macro provides by looking at other accounts
rg -n "#\[account\]" programs/account-compression/src/ --type rust -A 3 | head -40

Repository: Lightprotocol/light-protocol

Length of output: 1232


🏁 Script executed:

#!/bin/bash
# Look for Owner trait in Anchor-derived accounts and verify RegisteredProgram pattern
rg -n "impl Owner for\|impl.*Owner.*for.*Program" --type rust | head -20

Repository: Lightprotocol/light-protocol

Length of output: 54


🏁 Script executed:

#!/bin/bash
# Search for how RegisteredProgram is actually deserialized in the codebase
rg -n "RegisteredProgram::deserialize" --type rust -B 5 -A 5

Repository: Lightprotocol/light-protocol

Length of output: 881


Add discriminator validation for RegisteredProgram deserialization.

The function deserializes RegisteredProgram without validating the account's discriminator. While the codebase imports check_discriminator, it's not used here—unlike every other account deserialization in this same file (StateMerkleTreeAccount, AddressMerkleTreeAccount, BatchedMerkleTreeAccount), which all validate discriminators before deserializing.

Add the check before line 335:

Suggested change
let account = rpc
    .get_account(registered_program_pda)
    .await?
    .ok_or_else(|| {
        anyhow::anyhow!(
            "RegisteredProgram PDA not found for light_registry at {}",
            registered_program_pda
        )
    })?;

check_discriminator::<RegisteredProgram>(&account.data)?;
let registered_program = RegisteredProgram::deserialize(&mut &account.data[8..])
    .map_err(|e| anyhow::anyhow!("Failed to deserialize RegisteredProgram: {}", e))?;
🤖 Prompt for AI Agents
In @forester/src/tree_data_sync.rs around lines 319 - 344, In
fetch_protocol_group_authority, validate the account discriminator before
deserializing RegisteredProgram: after obtaining account (variable account) call
check_discriminator::<RegisteredProgram>(&account.data)? and only then call
RegisteredProgram::deserialize(&mut &account.data[8..]); this mirrors other
account handlers
(StateMerkleTreeAccount/AddressMerkleTreeAccount/BatchedMerkleTreeAccount) and
prevents deserializing the wrong account type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants