Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion src/node-control/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2113,7 +2113,7 @@ In service mode, nodectl runs as a daemon, automatically executing tasks on sche
2. **Get election parameters**
- Configuration `#15` — election time parameters
- Configuration `#34` — current validators
- Query `past_elections` to the Elector contract
- Query `past_elections` to the Elector contract (cached per election round; see [Cache refresh](#cache-refresh) below)

3. **For each enabled binding:**
- **Stake recovery** — check and request return of frozen stake
Expand Down Expand Up @@ -2145,6 +2145,19 @@ Each binding resolves its effective stake policy by checking for a per-node over

> **TONCore nominator: process pending withdraws before staking.** Every tick, the elections runner probes the active TONCore pool's `has_withdraw_requests` getter. When the queue is non-empty it sends `process_withdraw_requests` (op = 2, limit = 10, message value = 1 TON) between `recover_stake` and `participate`, then skips this tick's stake submission to let the pool drain; the next tick re-probes and either resends op = 2 (new requests appeared) or proceeds to stake. This frees up locked liquidity from nominators who already requested withdrawal so it does not get re-staked. The corresponding participant status surfaced in the snapshot is `processing_withdraw_requests`. The step is a no-op for SNP and direct staking.

#### Cache refresh

`past_elections` and pool addresses are cached per election round (the round can run for hours, so refetching every tick is wasteful). The cache is invalidated when:

- `election_id` changes (new round), **or**
- `elections.cache_refresh_secs` seconds elapsed since the last refresh.

The TTL refresh defends against a stale snapshot cached for the entire round when the initial fetch hits a lagging RPC endpoint (a real incident on mainnet caused inflated `frozen_stake` until the next round). On each refresh the runner logs `past_elections cache refreshed (reason=election_id|ttl, entries=N, election_id=...)`.

| Field | Default | Notes |
|-------|---------|-------|
| `elections.cache_refresh_secs` | `300` (5 min) | Set to `0` to disable the time-based refresh (only `election_id` changes invalidate). Config-file only; not exposed via REST. Picked up on next file reload (≤10s). |

### Logging

Configure logging output and level in the config file (`log` section). Override the log level temporarily via environment variable:
Expand Down
10 changes: 10 additions & 0 deletions src/node-control/common/src/app_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -652,6 +652,10 @@ fn default_sleep_pct() -> f64 {
0.2
}

fn default_cache_refresh_secs() -> u64 {
300
}

#[derive(serde::Serialize, serde::Deserialize, Clone)]
pub struct ElectionsConfig {
#[serde(default)]
Expand Down Expand Up @@ -684,6 +688,11 @@ pub struct ElectionsConfig {
/// ephemeral ADNL address every cycle for them (pre-v0.5 behavior).
#[serde(default, skip_serializing_if = "HashSet::is_empty")]
pub static_adnl_disabled: HashSet<String>,
/// TTL (seconds) for `past_elections` and pool-address caches. `0` disables the
/// time-based refresh (only election_id changes invalidate). Defends against
/// stale snapshots cached for the whole round after a bad initial fetch.
#[serde(default = "default_cache_refresh_secs")]
pub cache_refresh_secs: u64,
}

impl ElectionsConfig {
Expand Down Expand Up @@ -735,6 +744,7 @@ impl Default for ElectionsConfig {
waiting_period_pct: default_waiting_pct(),
static_adnls: HashMap::new(),
static_adnl_disabled: HashSet::new(),
cache_refresh_secs: default_cache_refresh_secs(),
}
}
}
Expand Down
75 changes: 75 additions & 0 deletions src/node-control/common/src/clock.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/*
* Copyright (C) 2025-2026 RSquad Blockchain Lab.
*
* Licensed under the GNU General Public License v3.0.
* See the LICENSE file in the root of this repository.
*
* This software is provided "AS IS", WITHOUT WARRANTY OF ANY KIND.
*/
use std::sync::{
Arc,
atomic::{AtomicU64, Ordering},
};

/// Wall-clock abstraction. Production uses [`SystemClock`]; tests use [`MockClock`].
pub trait Clock: Send + Sync {
/// Unix timestamp in seconds.
fn now(&self) -> u64;
}

pub struct SystemClock;

impl Clock for SystemClock {
fn now(&self) -> u64 {
crate::time_format::now()
}
}

/// Clones share state, so a test can advance time after handing the clock to the SUT.
#[derive(Clone, Default)]
pub struct MockClock {
now: Arc<AtomicU64>,
}

impl MockClock {
pub fn new(initial: u64) -> Self {
Self { now: Arc::new(AtomicU64::new(initial)) }
}

pub fn set(&self, t: u64) {
self.now.store(t, Ordering::Relaxed);
}

pub fn advance(&self, secs: u64) {
self.now.fetch_add(secs, Ordering::Relaxed);
}
}

impl Clock for MockClock {
fn now(&self) -> u64 {
self.now.load(Ordering::Relaxed)
}
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn mock_clock_set_and_advance() {
let clock = MockClock::new(100);
assert_eq!(clock.now(), 100);
clock.advance(50);
assert_eq!(clock.now(), 150);
clock.set(1000);
assert_eq!(clock.now(), 1000);
}

#[test]
fn mock_clock_clones_share_state() {
let clock = MockClock::new(0);
let clone = clock.clone();
clock.set(42);
assert_eq!(clone.now(), 42);
}
}
1 change: 1 addition & 0 deletions src/node-control/common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
*/
pub mod app_config;
pub mod clap_utils;
pub mod clock;
pub mod log;
pub mod os_signals;
pub mod password;
Expand Down
49 changes: 35 additions & 14 deletions src/node-control/service/src/elections/runner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ use super::{
use anyhow::Context as _;
use common::{
app_config::{BindingStatus, ElectionsConfig, NodeBinding, StakePolicy},
clock::{Clock, SystemClock},
snapshot::{
ElectionsParticipantSnapshot, ElectionsSnapshot, ElectionsStatus, OurElectionParticipant,
ParticipationStatus, SnapshotStore, StakeSubmission, TimeRange, ValidatorNodeSnapshot,
Expand Down Expand Up @@ -251,9 +252,12 @@ pub(crate) struct ElectionRunner {
default_max_factor: f32,
default_stake_policy: StakePolicy,
past_elections: Vec<PastElections>,
/// Election ID for which `past_elections` and `cached_prev_min_eff` were fetched.
/// Used to avoid redundant RPC calls within the same election round.
/// Election ID the cache (past_elections + pool addresses) is keyed on.
past_elections_cache_id: u64,
/// Wall-clock of the last cache refresh.
cache_refreshed_at: u64,
/// Cache TTL; `0` disables the time-based refresh (only election_id changes invalidate).
cache_refresh_secs: u64,
/// Cached prev_min_eff_stake computed from past_elections.
cached_prev_min_eff: Option<u64>,
// Snapshot cache updated during tick execution and published to SnapshotStore in run_loop().
Expand All @@ -265,6 +269,7 @@ pub(crate) struct ElectionRunner {
/// Callback to persist freshly generated static ADNL addresses into runtime config.
/// `None` in tests that don't care about persistence.
persist_static_adnls: Option<PersistStaticAdnls>,
clock: Arc<dyn Clock>,
}

#[derive(Default)]
Expand Down Expand Up @@ -417,13 +422,21 @@ impl ElectionRunner {
snapshot_cache: SnapshotCache::default(),
past_elections: vec![],
past_elections_cache_id: 0,
cache_refreshed_at: 0,
cache_refresh_secs: elections_config.cache_refresh_secs,
cached_prev_min_eff: None,
sleep_pct: elections_config.sleep_period_pct,
waiting_pct: elections_config.waiting_period_pct,
persist_static_adnls,
clock: Arc::new(SystemClock),
}
}

#[cfg(test)]
pub(crate) fn set_clock(&mut self, clock: Arc<dyn Clock>) {
self.clock = clock;
}

pub async fn run_loop(
&mut self,
tick_interval: Duration,
Expand Down Expand Up @@ -523,21 +536,22 @@ impl ElectionRunner {

let mut skip_tick_nodes = vec![];

// Pool address cache must be valid before any branch that uses `stake_addr`/`pool_target`
// (the finished branch below included). TONCore router pool address changes per election
// cycle (the router alternates between two pools), so invalidate the cache on election_id
// transition. SNP pool addresses are stable but invalidating uniformly is cheap.
// Also covers elections-task restart: `past_elections_cache_id` is 0 after start, so the
// first tick lands here and re-resolves.
if self.past_elections_cache_id != election_id {
// TTL refresh defends against stale snapshots cached for the whole round when the
// initial fetch hit a lagging RPC endpoint. TONCore pool address alternates per
// cycle, so uniform invalidation is required.
let now_ts = self.clock.now();
let cache_expired = self.cache_refresh_secs > 0
&& now_ts.saturating_sub(self.cache_refreshed_at) >= self.cache_refresh_secs;
let should_refresh_cache = self.past_elections_cache_id != election_id || cache_expired;
if should_refresh_cache {
for node in self.nodes.values_mut() {
if node.pool.is_some() {
node.pool_addr_cache = None;
}
}
}
// Resolve pool address for any node where it isn't cached yet. On election_id transition
// the cache was just invalidated above; on other ticks this recovers from a transient
// Resolve pool address for any node where it isn't cached yet. On cache invalidation
// the cache was just cleared above; on other ticks this recovers from a transient
// `pool.address()` failure (e.g. a `get_pool_data` parse error on TONCore).
for (node_id, node) in self.nodes.iter_mut() {
Self::resolve_pool_addr(node_id, node, &mut skip_tick_nodes).await;
Expand Down Expand Up @@ -578,21 +592,28 @@ impl ElectionRunner {
self.snapshot_cache.last_elections_status = ElectionsStatus::Postponed;
}

// Fetch past_elections only when election_id changes (cache across ticks).
if self.past_elections_cache_id != election_id {
if should_refresh_cache {
let reason =
if self.past_elections_cache_id != election_id { "election_id" } else { "ttl" };
self.past_elections = self.elector.past_elections().await.context("past_elections")?;
self.cached_prev_min_eff = self
.past_elections
.first()
.and_then(|pe| pe.frozen_map.values().min_by_key(|f| f.stake).map(|f| f.stake));

tracing::info!(
"past_elections cache refreshed (reason={}, entries={}, election_id={})",
reason,
self.past_elections.len(),
election_id
);
if let Some(prev) = self.cached_prev_min_eff {
tracing::info!(
"prev_min_eff_stake from past elections: {} TON",
nanotons_to_tons_f64(prev)
);
}
self.past_elections_cache_id = election_id;
self.cache_refreshed_at = now_ts;
}

// walk through the nodes and try to participate in the elections
Expand Down
Loading
Loading