Skip to content

feat: optimize rpc calls#5394

Open
sbackend123 wants to merge 7 commits intomasterfrom
feat/rpc-calls-optimisation
Open

feat: optimize rpc calls#5394
sbackend123 wants to merge 7 commits intomasterfrom
feat/rpc-calls-optimisation

Conversation

@sbackend123
Copy link
Contributor

@sbackend123 sbackend123 commented Mar 12, 2026

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Removed RPC patterns in the transaction flow. Added cache layer for BlockNumber rpc call.
Fixed lint issues in the new cache package and cleaned up minor lint findings in adjacent test/helper code.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

#5388

Screenshots (if appropriate):

Screenshot from 2026-03-16 13-29-22

Highly likely hit ratio is low because of high load errors (EOF), which is fix by PR

@sbackend123 sbackend123 changed the title RPC calls optimisation Feat: RPC calls optimisation Mar 12, 2026
@sbackend123 sbackend123 force-pushed the feat/rpc-calls-optimisation branch from f8ca4e2 to e55ac11 Compare March 13, 2026 07:54
@sbackend123 sbackend123 force-pushed the feat/rpc-calls-optimisation branch from e55ac11 to 04f6520 Compare March 13, 2026 08:00
@sbackend123 sbackend123 changed the title Feat: RPC calls optimisation feat: optimize rpc calls Mar 13, 2026
@sbackend123 sbackend123 marked this pull request as ready for review March 16, 2026 12:31
blocktime,
true,
c.config.GetUint64(optionNameMinimumGasTipCap),
blocktime-2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that current blocktime on our CI is 1s. So this way caching is not used there... How we could achive to use it there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In deploy.go blockTime is hardcoded by itself. But if we are talking about CI, normally cacheTTL is about 85% of block time, so even if it is 1 second, than we will get 850 ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very strange, as block time is defined as const blocktime = 15 on line 16, which is actually a nanosecond. As that is passed in poolingInterval, it looks to me that it is a very short time, I suppose that block time should be in minutes, not nanoseconds. Luckily, this is only for the deploy command. Bee has its own block time passed from options. I believe that line 16 in this file should look like this const blocktime = 15 * time.Minute, which is not related to this PR.

Also what happens if blocktime value is 1 (if someone changes the constant) and we have a negative duration? If the constant is changed to the much longer value like 15*time.Minute, literal 2 in here would reduce only two nanoseconds from that large number.

t.Helper()

var expSegments [][]byte
expSegments := make([][]byte, 0, len(exp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are fine but feel like they belong in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but linter was failing so I decided to fix everything. Next time will create separate PR

pollingInterval time.Duration,
chainEnabled bool,
minimumGasTipCap uint64,
blockCacheTTl time.Duration,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo blockCacheTTl -> blockCacheTTL

}
}

func NewMetrics() Metrics {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not used (dead code).

c.metrics.LoadErrors.Inc()
return val, err
}
c.Set(val, time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use here now from argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could happen if loader takes a while?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so because there is difference between time when we request value and time, when we set new value if we had to load it.

@gacevicljubisa
Copy link
Member

Suggestion: Instead of a fixed TTL (blocktime*85/100, blocktime-2) + BlockNumber RPC, use a single HeaderByNumber(nil) call periodically to get both the block number and timestamp, then extrapolate between syncs with zero RPC calls.

How it works:

  • Every ~20 blocks, call HeaderByNumber(nil) (latest) — returns block number + timestamp in one RPC call. Store as anchor point.
  • Between syncs, estimate the current block from pure math: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime. No RPC calls at all.
  • Cache TTL: anchorTimestamp + (currentBlock - anchorBlock + 1) * blockTime - now.
    ?

@@ -0,0 +1,107 @@
// Copyright 2025 The Swarm Authors. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026

@@ -0,0 +1,189 @@
// Copyright 2025 The Swarm Authors. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026

@@ -0,0 +1,10 @@
// Copyright 2025 The Swarm Authors. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026

@@ -0,0 +1,84 @@
// Copyright 2025 The Swarm Authors. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2026

expiresAt time.Time

group singleflight.Group
key Key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key here is only used in metrics and in c.group.Do as the key, but that key is always the same as the singleflight group is the same and the key is the same for the same instance of the ExpiringSingleFlightCache type. I owuld suggest to remove the Key type and just to provide the metrics prefix string in the constructor NewExpiringSingleFlightCache.

}(i)
}

time.Sleep(50 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could synctest be used here to avoid sleeping and potentially have a flaky test as synchronization not happen in 50ms?


c.metrics.Misses.Inc()

result, err, shared := c.group.Do(string(c.key), func() (any, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This group is always calling the same key, so it can be even the static string.

b.metrics.TotalRPCCalls.Inc()
b.metrics.BlockNumberCalls.Inc()

blockNumber, err := b.backend.BlockNumber(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the golang.org/x/sync/singleflight is that it is not context.Context aware. The context that is passed from the first caller will influence the end result of all other callers of this function. Meaning, if the first caller cancels or times out on the context, all other callers will receive that error even if they did not cancel or the timeout for their call did not happen. This is why I've created the resenje.org/signleflight, that is context aware and will not terminate the execution of the function until all callers terminate their contexts. There are a few places where resenje.org/singleflight is used in bee, if you like, you can look at it and consider using if you see important for this case. Basically GetOrLoad would require to accept context, pass to signleflight.Do and the context from the callback would be used in b.backend.BlockNumber.

blocktime,
true,
c.config.GetUint64(optionNameMinimumGasTipCap),
blocktime-2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very strange, as block time is defined as const blocktime = 15 on line 16, which is actually a nanosecond. As that is passed in poolingInterval, it looks to me that it is a very short time, I suppose that block time should be in minutes, not nanoseconds. Luckily, this is only for the deploy command. Bee has its own block time passed from options. I believe that line 16 in this file should look like this const blocktime = 15 * time.Minute, which is not related to this PR.

Also what happens if blocktime value is 1 (if someone changes the constant) and we have a negative duration? If the constant is changed to the much longer value like 15*time.Minute, literal 2 in here would reduce only two nanoseconds from that large number.


c.value = value
c.valid = true
c.expiresAt = now.Add(c.ttl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, for the block number caching, the expiration time should be the time when the next block number is expected and that is around the block time interval. This is to have as least as possible time when the cache is reporting the older block number comapred what is the current block number.

In this generalized caching implementation, it is hard to generalize such requirement as it is very specific to how the block number is increased (in some ~ period, that is configurable). So the ideal situation would be to calculate what would be the time of the next block number and to set expiresAt to that time, or a bit later. And to watch if the block number is actually increased.

In the current implementation, the worst case would be that the cache will keep the block number cache up to the block time (15 minutes, for example on some networks) when the actual block number is increased, if the value is set on the cache a moment before the new block number. On average, statistically in a very large sample of caching, that time would be half of the block time (7 minutes).

I am not sure if such precision is needed, and if the delay of the new block number of up to the whole block time is acceptable. This is just my observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants