Skip to content

Conversation

@yzang2019
Copy link
Collaborator

@yzang2019 yzang2019 commented Oct 1, 2025

Describe your changes and provide context

This PR add couple of new metrics using otel for seidb

Testing performed to validate your change

Example:
image

db.mtx.Lock()
defer db.mtx.Unlock()

startTime := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
@yzang2019 yzang2019 requested a review from blindchaser October 6, 2025 15:46
Copy link
Collaborator

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocker about measuring latency in relation to errors. Left a bunch of suggestions.

sc/memiavl/db.go Outdated

startTime := time.Now()
defer func() {
metrics.SeiDBMetrics.RestartLatency.Record(context.Background(), time.Since(startTime).Seconds())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag by "status" in case there is an error. Otherwise the times measured could be missleading.

Ditto for all such defer statements for functions that may also return error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, will add

sc/memiavl/db.go Outdated
startTime := time.Now()
defer func() {

metrics.SeiDBMetrics.CommitLatency.Record(context.Background(), time.Since(startTime).Milliseconds())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declare context once and reuse it?

left Node,
right Node) *MemNode {
TotalNumOfMemNode.Add(1)
TotalMemNodeSize.Add(int64(120 + len(key)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move these increments to when the branch node is being added to its parent struct? Instantiation alone does not mean increase in size, right?

Ditto re newLeafNode.

Copy link
Collaborator Author

@yzang2019 yzang2019 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each time we call newBranchNode or newLeafNode, it's always gonna mean the size would be increase, so I feel this is the best place to add size and count increase

Copy link
Collaborator Author

@yzang2019 yzang2019 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually ignore my previous comment, let me check again, the setRecursive function might be a bit confusing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just double checked, there are only 2 places where new memNodes are created:

  1. setRecursive (
    func setRecursive(node Node, key, value []byte, version, cowVersion uint32) (*MemNode, bool) {
    )
  2. mutate (
    func (node PersistedNode) Mutate(version, _ uint32) *MemNode {
    )

So both places are creating memnodes and making it as part of the tree

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could either choose to adding the bump logic to all these places or I can choose to just add the bump size logic in initialization, which is less code duplicates. Feel like that latter is easier?

@codecov
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

❌ Patch coverage is 94.59459% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.00%. Comparing base (f369e78) to head (0b567e7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
common/metrics/metrics.go 0.00% 4 Missing ⚠️

❌ Your project status has failed because the head coverage (40.00%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #114      +/-   ##
==========================================
+ Coverage   39.80%   40.00%   +0.19%     
==========================================
  Files          58       59       +1     
  Lines        7477     7510      +33     
==========================================
+ Hits         2976     3004      +28     
- Misses       4156     4161       +5     
  Partials      345      345              
Files with missing lines Coverage Δ
sc/memiavl/db.go 61.81% <100.00%> (+1.37%) ⬆️
sc/memiavl/mem_node.go 93.37% <100.00%> (+1.06%) ⬆️
sc/memiavl/multitree.go 71.60% <100.00%> (+0.08%) ⬆️
sc/memiavl/node.go 71.66% <100.00%> (-1.78%) ⬇️
sc/memiavl/persisted_node.go 91.12% <100.00%> (-0.61%) ⬇️
common/metrics/metrics.go 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fileLock FileLock
)

startTime := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
db.mtx.Lock()
defer db.mtx.Unlock()

startTime := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
go func() {
defer close(ch)

startTime := time.Now()

Check warning

Code scanning / CodeQL

Calling the system time Warning

Calling the system time may be a possible source of non-determinism
@yzang2019 yzang2019 requested a review from masih October 6, 2025 20:52

func newLeafNode(key, value []byte, version uint32) *MemNode {
TotalNumOfMemNode.Add(1)
TotalMemNodeSize.Add(int64(120 + len(key) + len(value)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add comments about why it's 120?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment & offsets (64-bit)

height = 1

padding after height = 3 (to align uint32)

version = 4 → offset now 8

size = 8 → offset 16

key = 24 → offset 40

value = 24 → offset 64

left = 16 → offset 80

right = 16 → offset 96

hash = 24 → offset 120

Total struct size = 120 bytes (already a multiple of 8, so no trailing padding).

SnapshotCreationLatency metric.Float64Histogram
CommitLatency metric.Int64Histogram
ApplyChangesetLatency metric.Int64Histogram
MemNodeTotalSize metric.Int64Gauge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from chatGPT: MemNodeSizeObs metric.Int64ObservableGauge type is good for “now” value , same for MemNodeCount, for your consideration

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what we want, please check the graph here:
image

@yzang2019 yzang2019 requested a review from blindchaser October 7, 2025 05:10
"io"
"math"
"sync/atomic"
"unsafe"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
Copy link
Collaborator

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of gosec issues to fix, otherwise LGTM 👍

* main:
  Revert "Merge pull request #105 from sei-protocol/PebbleSurfaceErrors" (#118)
@yzang2019 yzang2019 enabled auto-merge (squash) October 8, 2025 14:29
@yzang2019 yzang2019 merged commit c3ee918 into main Oct 8, 2025
5 checks passed
@yzang2019 yzang2019 deleted the yzang/add-metrics branch October 8, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants