-
Notifications
You must be signed in to change notification settings - Fork 23
OCTRL-1041: Report DCS call duration per detector not only global #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3a2d1aa to
460d2d7
Compare
core/integration/dcs/plugin.go
Outdated
| metric := newMetric("EOR") | ||
| defer monitoring.TimerSendSingle(&metric, monitoring.Millisecond)() | ||
| eor := "EOR" | ||
| detectorDurations := map[dcspb.Detector]time.Duration{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| detectorDurations := map[dcspb.Detector]time.Duration{} | |
| durationsPerDetector map[dcspb.Detector]time.Duration{} |
maybe?
core/integration/dcs/plugin.go
Outdated
| detectorDurations := map[dcspb.Detector]time.Duration{} | ||
| start := time.Now() | ||
|
|
||
| wholeMetric := newMetric(eor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| wholeMetric := newMetric(eor) | |
| totalCallDurationMetric := newMetric(eor) |
I am afraid the meaning of "wholeMetric" might be less clear outside of the context of this PR. How about this?
| ) (error, []byte) { | ||
| metric := newMetric("SOR") | ||
| defer monitoring.TimerSendSingle(&metric, monitoring.Millisecond)() | ||
| sor := "SOR" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sor := "SOR" | |
| callName := "SOR" |
It's nitpicking, but perhaps this would be better?
After all, if I wanted to declare e.g. a constant defining a number of FLPs, we would not call it twoHundred but numberOfFLPs.
The same comment applies for other calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree generally with you, but I prefer calling it sor, eor, pfr in this case. It looks more readable when passing it around knowing it's value and inherently it's meaning. The reason is that wile coding you can see hints inside the editors , so I see something like method: sor making it better than method: callName
I also don't think that your example is quite applicable as twoHundred is really generic, but sor is quite specific thing in our tech stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's a matter of personal preference and I would not judge one as better than other in our case. It's OK.
| } | ||
|
|
||
| detectorStatusMap[dcsEvent.GetDetector()] = dcsEvent.GetState() | ||
| detectorDurations[dcsEvent.GetDetector()] = time.Since(start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, the duration will be incorrect if the detector times out or there is some weirder error, am I wrong? In such case, the loop will break before reaching this line.
Not sure what to propose for the timeout case... after all, some detectors might complete an operation and some not, so we would want to set the timeout value for those who did not complete, but we would have to do some guess work instead of relying on dcsEvent.GetDetector().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I overlooked the break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we both agree that we are not sure how to fix this, I tried to resolve the problem by adding result tag to the totalCallDurationMetric. It won't fix this problem with incorrect duration with some detectors, but we might at least know that we cannot truest results in a case, when we break outside of the loop early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. We can then observe what we see in prod and adapt it if needed.
460d2d7 to
ae046cd
Compare
d72db49 to
ee97aa6
Compare
results of dcs calls renaming
ee97aa6 to
ab06e39
Compare
Added DCS results and durations per detector after given DCS call. Added states of all detectors to global (All) metric after DCS call.