-
Notifications
You must be signed in to change notification settings - Fork 80
[-] fix log parser offset reset on file switch, closes #1089 #1128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[-] fix log parser offset reset on file switch, closes #1089 #1128
Conversation
Pull Request Test Coverage Report for Build 21244648072Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
| // Restore offset if we've seen this file before and it hasn't been truncated | ||
| if savedOffset, ok := lp.fileOffsets[fileName]; ok && savedOffset <= int64(latestSize) { | ||
| offset = int32(savedOffset) | ||
| logger.Infof("Switching to logfile: '%s' (resuming from offset %d)", fileName, offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the file has been truncated and the new offset is bigger than or equal to our saved offset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to discuss about this
We don't have an accurate method to check if the file is truncated in this case. We are assuming that log_truncate_on_rotation = off So the file will only grow.
One idea is in mind is to store some of the latest bytes of the file and compare them with the newly updated file.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use select current_setting('log_truncate_on_rotation'); on parser instantiation to know whether it's on or off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, I missed it
If it's on
We are assuming that the file will only grow and there are no external delete or truncate operations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's on
We are assuming that the file will only grow and there are no external delete or truncate operations, right?
If it's "off"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, off. My bad
Thanks for the guidance, working on it
internal/reaper/logparser_local.go
Outdated
| // Restore line count if we've seen this file before | ||
| if savedLines, ok := lp.fileOffsets[file]; ok { | ||
| linesRead = int(savedLines) | ||
| logger.Infof("Switching to logfile: %s (resuming from line %d)", file, linesRead) | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above + why isn't file truncation considered at all here?
internal/reaper/logparser_test.go
Outdated
| t.Run("save and restore offset", func(t *testing.T) { | ||
| lp := &LogParser{ | ||
| fileOffsets: make(map[string]int64), | ||
| } | ||
|
|
||
| // Simulate saving offset for file1 | ||
| lp.fileOffsets["postgresql-10.csv"] = 50000 | ||
|
|
||
| // Verify offset is saved | ||
| assert.Equal(t, int64(50000), lp.fileOffsets["postgresql-10.csv"]) | ||
|
|
||
| // Simulate saving offset for file2 | ||
| lp.fileOffsets["postgresql-11.csv"] = 75000 | ||
|
|
||
| // Verify both offsets are saved | ||
| assert.Equal(t, int64(50000), lp.fileOffsets["postgresql-10.csv"]) | ||
| assert.Equal(t, int64(75000), lp.fileOffsets["postgresql-11.csv"]) | ||
|
|
||
| // Simulate restoring offset when switching back to file1 | ||
| restoredOffset, ok := lp.fileOffsets["postgresql-10.csv"] | ||
| assert.True(t, ok, "should find saved offset") | ||
| assert.Equal(t, int64(50000), restoredOffset) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any useful stuff that this tests.
internal/reaper/logparser_test.go
Outdated
| t.Run("offset restoration with truncation check", func(t *testing.T) { | ||
| lp := &LogParser{ | ||
| fileOffsets: make(map[string]int64), | ||
| } | ||
|
|
||
| // Simulate a file that was previously read up to offset 50000 | ||
| lp.fileOffsets["postgresql-10.csv"] = 50000 | ||
|
|
||
| // Simulate file was truncated (new size is smaller than saved offset) | ||
| newFileSize := int64(10000) | ||
| savedOffset, ok := lp.fileOffsets["postgresql-10.csv"] | ||
|
|
||
| if ok && savedOffset <= newFileSize { | ||
| // File not truncated, use saved offset | ||
| assert.Fail(t, "should not use saved offset when file is truncated") | ||
| } else { | ||
| // File was truncated or new, start from 0 | ||
| // This is the expected behavior | ||
| assert.True(t, true, "correctly detected truncation") | ||
| } | ||
|
|
||
| // Simulate file grew (new size is larger than saved offset) | ||
| newFileSize = int64(60000) | ||
| savedOffset, ok = lp.fileOffsets["postgresql-10.csv"] | ||
|
|
||
| if ok && savedOffset <= newFileSize { | ||
| // File not truncated, use saved offset | ||
| assert.Equal(t, int64(50000), savedOffset) | ||
| } else { | ||
| assert.Fail(t, "should use saved offset when file has grown") | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.True(t, true, "correctly detected truncation")
and
assert.Fail(t, "should use saved offset when file has grown")
don't indicate anything.
also you shouldn't use the ok && savedOffset <= newFileSize operator itself; you should invoke the real function and see how it behaves. There is no function from the actual code called here.
274d37a to
39e5638
Compare
…resql#1089 When using hourly log rotation with log_truncate_on_rotation=off, the log parser was resetting the read offset to 0 when switching back to previously-seen files, causing duplicate log entries to be counted. This fix adds in-memory offset tracking per log file: - Added fileOffsets map to LogParser struct - Save offset before switching files - Restore offset when returning to a known file - Prune map when exceeding 100 entries to prevent memory growth - Added unit tests for offset tracking and pruning logic
39e5638 to
52c8eac
Compare
Fix log parser offset reset on file switch
Fixes #1089
Problem
When using PostgreSQL with hourly log rotation (
log_filename = postgresql-%H.csv) andlog_truncate_on_rotation = off, the log parser was resetting its read offset to 0 when switching back to previously-seen files. This caused duplicate log entries to be counted every time a file was reused (e.g., 24 hours later).Root Cause
In both
logparser_remote.go(line 87) andlogparser_local.go(line 111), the offset/line count was unconditionally reset to 0 when switching files:Solution
Added in-memory offset tracking per log file:
fileOffsets map[string]int64toLogParserstructpruneFileOffsets()to clear map when exceeding 100 entries (prevents unbounded memory growth)Changes
logparser.gofileOffsetsfield andpruneFileOffsets()methodlogparser_remote.gologparser_local.gologparser_test.goTestFileOffsetsTrackingAndPruningTesting