Skip to content

Conversation

@squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Jan 12, 2026

Description

Once we started forwarding native price estimates from the Orderbook to Autopilot, CoinGecko API usage went up. This happened because the estimator was disabled in the Orderbook and moved to Autopilot, which now handles all requests and also relies on CoinGecko.

This PR refactors native price estimation by splitting the price estimator into primary and secondary and introducing a shared cache between them, with source-aware maintenance.

Changes

Shared cache with source tracking

  • Both estimators share a single NativePriceCache, reducing memory usage and avoiding duplicate price fetches for tokens requested by both
  • Each cached entry is tagged with an EstimatorSource (Primary or Secondary), indicating which estimator originally fetched it

Source-aware maintenance

  • The background maintenance task dispatches to the correct estimator based on each token's source tag
  • This ensures tokens fetched via secondary-specific sources (e.g., CoinGecko) are maintained by the secondary estimator, not the primary one
  • Avoids calling all sources for all tokens during maintenance cycles

Optional secondary estimator

  • An optional secondary native price estimation config can be specified for API requests
  • If absent, the primary estimator handles all requests
  • This allows configuring different price sources for the Autopilot API vs auction competition

Design considerations

The source-tracking approach addresses a subtle issue with shared cache maintenance. With two estimators (primary using CoinGecko, secondary without), a naive approach of picking one estimator for all maintenance would fail:

  • If primary maintains everything: tokens initially fetched via secondary would start hitting CoinGecko during maintenance, defeating the purpose
  • If secondary maintains everything: tokens originally fetched via CoinGecko couldn't be properly refreshed

By tagging each cached entry with its source and dispatching maintenance accordingly, tokens stay with their original estimator throughout their cache lifetime.

How to test

Existing tests.

@squadgazzz squadgazzz marked this pull request as ready for review January 12, 2026 17:41
@squadgazzz squadgazzz requested a review from a team as a code owner January 12, 2026 17:41
Copy link
Contributor

@jmg-duarte jmg-duarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the way the caching mechanism is set up confusing, maybe better docs can help but at the same time I feel that it is a bit too complex and could probably be made simpler (at the cost of a bigger refactor)

Comment on lines 391 to 395
/// Creates a new CachingNativePriceEstimator with a background maintenance
/// task.
///
/// The maintenance task periodically refreshes cached prices before they
/// expire. Use this for the primary estimator in a shared-cache setup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't cache maintenance be coupled with the cache itself?

In my head it makes more sense to create a single cache, comes with maintenance by default and when you want more structs to have access to it, hand out references (which you do with the clone on the struct with the Arc)

I'm a bit confused by the separation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't clearly see your idea. Which estimators will the cache maintenance task will be using then?

/// Main estimator, which runs the background cache maintenance task.
pub main: Arc<CachingNativePriceEstimator>,
/// API estimator that shares the cache with the main estimator but doesn't
/// run background task and might use a different set of sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come it can use a different set of sources but not run the background task to update entries according to them?

What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the API estimators is that a subset of the main estimators. But after thinking more about it, we can probably run a maintenance task with estimators combined from api and main estimators to ensure everything is covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked the approach. It would require starting the review from scratch. Sorry about that.

Comment on lines 379 to 382
/// The main estimator includes all sources and runs the background
/// maintenance task. The API estimator (if `api_native` is provided)
/// uses its own set of sources but shares the cache, so it can return
/// cached prices without triggering requests to excluded sources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different set of sources is either poorly documented and needs a better explanation or it feels like a footgun / something that may cause a lot of confusion during investigations

Copy link
Contributor Author

@squadgazzz squadgazzz Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly confuses you? Different sets of estimators are used for the orderbook and auction competition.

@squadgazzz squadgazzz marked this pull request as draft January 13, 2026 17:22
@squadgazzz squadgazzz marked this pull request as ready for review January 14, 2026 14:00
@squadgazzz squadgazzz requested a review from jmg-duarte January 14, 2026 14:01
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant architectural improvement by separating the native price estimator for the Autopilot API and introducing a shared cache with source-aware maintenance. This refactoring effectively reduces memory usage and prevents redundant price fetches. The new NativePriceCache encapsulates caching logic cleanly, simplifying the CachingNativePriceEstimator and improving the overall design. The changes are well-structured and align with the goals outlined in the description. I've found one potential issue that could lead to a panic, which I've detailed in a specific comment.

// will fetch the price during the next maintenance cycle.
// This should happen only for prices missing while building the auction.
// Otherwise malicious actors could easily cause the cache size to blow up.
let outdated_timestamp = now.checked_sub(self.inner.max_age).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of .unwrap() on checked_sub could cause a panic if self.inner.max_age is larger than the system's uptime. This might occur if max_age is configured to a large value (e.g., hours or days) and the service has recently restarted. To prevent this, it's crucial to handle the None case from checked_sub gracefully instead of unwrapping it directly.

                    let outdated_timestamp = match now.checked_sub(self.inner.max_age) {
                        Some(timestamp) => timestamp,
                        None => {
                            tracing::warn!(
                                max_age = ?self.inner.max_age,
                                ?token,
                                "max_age is larger than system uptime, cannot create outdated placeholder entry for token"
                            );
                            return None;
                        }
                    };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the old code. No changes here.

/// Creates a new CachingNativePriceEstimator.
///
/// The estimator will use the provided cache for lookups and will fetch
/// prices on-demand for cache misses. Background maintenance (keeping the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we keep metrics on cache hits / misses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym? I think I didn't touch metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "keep" I meant like "keep an eye", but I wasn't clear, rephrasing it: should we start tracking cache hits/misses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably makes sense to build some metrics in a separate PR. Instead of hits/misses, I personally would be interested in cached values that are never accessed later.

updated_at,
now,
Default::default(),
EstimatorSource::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it a bit hard to reason about what primary and secondary is supposed to be. Given that this logic is already extremely specific it would probably make sense to just give them specific names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also thinking about that. Which exactly names, for example? They are located in the shared crate, which doesn't know anything about autopilot or orderbook, I assume.


/// Returns true if the cache is empty.
pub fn is_empty(&self) -> bool {
self.len() == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: usually if there is a an is_empty() function it's best to use that instead of len() == 0. Likely doesn't matter for this case but for example the simplest linked list has len() == O(n) but is_empty() == O(1).

/// originally fetched it. The cache's background maintenance task uses
/// this information to dispatch updates to the appropriate estimator,
/// ensuring each token is refreshed using the same source that
/// originally fetched it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to relate to the design considerations you mention in the description:

The source-tracking approach addresses a subtle issue with shared cache maintenance. With two estimators (primary using CoinGecko, secondary without), a naive approach of picking one estimator for all maintenance would fail:
If primary maintains everything: tokens initially fetched via secondary would start hitting CoinGecko during maintenance, defeating the purpose
If secondary maintains everything: tokens originally fetched via CoinGecko couldn't be properly refreshed
By tagging each cached entry with its source and dispatching maintenance accordingly, tokens stay with their original estimator throughout their cache lifetime.

However, I'm don't think this is thought through completely. With the exception for onchain placed orders (😬) all new native prices have to be fetched by an API request originally. Unless I'm missing something once a user places an order for a completely new token the autopilot will continue to update the cache without coingecko (since that is not part of the estimator that originally fetched the price). I suspect this token could only ever be upgraded to be fetched by coingecko after a restart puts it into the cache and marks it as a primary token.

What I think would be closer to what we try to achieve is that we only keep tokens warm that are actually used in the auction. So what I would imagine is this:

  1. API request for a completely new native price
  2. autopilot caches it but it's not marked for maintenance yet
  3. a. user never places an order => maintenance never refetches the token, eventually gets evicted from the cache
    b. user places an order => when autopilot fetches the price for building the auction it marks it as "worthy of maintenance"

In that approach there would be only the main estimator (with coingecko) running the maintenance and only for the tokens that were explicitly marked by the autopilot. That deviates from what we currently do (where the API is configured to run the maintenance as well) but I think since we now have a single cache that's shared by everyone and has the prices for all tokens in the auction kept up to date the API probably can stop updating it's cache entirely.

And regarding the issue that you brought somewhere else where it can be an issue where the estimator that handled the orderbook native price request is "more powerful" (i.e. would find native prices that the estimator with coingecko is not able to) I'd like to point out that we could perhaps stick to 1 required native price estimator argument and an optional coingecko argument. Only if coingecko is enabled the need for a second estimator would arise and the autopilot could just add coingecko to the regular estimators.

That way the estimator used for maintaining the cache is always at least as capable as the estimator used to handle API requests.

All that being said I agree with @jmg-duarte that this is very complicated but unfortunately I was also not able to come up with a cleaner idea. 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a. user never places an order => maintenance never refetches the token, eventually gets evicted from the cache
b. user places an order => when autopilot fetches the price for building the auction it marks it as "worthy of maintenance"

My worry here is that if something is placed into the cache, it has to be maintained. Otherwise, our orderbook will be sending stale results until the item expires in the cache. That is valid for the cases where no order was placed, but still, this affects the quote competition, I suppose. Do I miss something, or can we ignore this problem for the quote competition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants