Skip to content

Conversation

@ijaidev
Copy link

@ijaidev ijaidev commented Jan 17, 2026

Fix log parser offset reset on file switch

Fixes #1089

Problem

When using PostgreSQL with hourly log rotation (log_filename = postgresql-%H.csv) and log_truncate_on_rotation = off, the log parser was resetting its read offset to 0 when switching back to previously-seen files. This caused duplicate log entries to be counted every time a file was reused (e.g., 24 hours later).

Root Cause

In both logparser_remote.go (line 87) and logparser_local.go (line 111), the offset/line count was unconditionally reset to 0 when switching files:

offset = 0  // BUG: Always resets, even for previously-seen files

Solution

Added in-memory offset tracking per log file:

  • Added fileOffsets map[string]int64 to LogParser struct
  • Save current file's offset before switching
  • Restore offset when returning to a previously-seen file
  • Added pruneFileOffsets() to clear map when exceeding 100 entries (prevents unbounded memory growth)
  • Added unit tests for offset tracking and pruning logic

Changes

File Change
logparser.go Added fileOffsets field and pruneFileOffsets() method
logparser_remote.go Save/restore byte offsets on file switch
logparser_local.go Save/restore line counts on file switch
logparser_test.go Added TestFileOffsetsTrackingAndPruning

Testing

  • Docker build successful
  • pgwatch runs and parses logs correctly
  • Unit tests added for offset tracking and pruning

@coveralls
Copy link

coveralls commented Jan 17, 2026

Pull Request Test Coverage Report for Build 21244648072

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 48 of 94 (51.06%) changed or added relevant lines in 3 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 75.49%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/reaper/logparser_remote.go 7 25 28.0%
internal/reaper/logparser_local.go 11 39 28.21%
Files with Coverage Reduction New Missed Lines %
internal/reaper/logparser_local.go 1 60.53%
Totals Coverage Status
Change from base Build 21244180723: 0.2%
Covered Lines: 4081
Relevant Lines: 5406

💛 - Coveralls

@0xgouda 0xgouda self-assigned this Jan 18, 2026
Comment on lines 90 to 104
// Restore offset if we've seen this file before and it hasn't been truncated
if savedOffset, ok := lp.fileOffsets[fileName]; ok && savedOffset <= int64(latestSize) {
offset = int32(savedOffset)
logger.Infof("Switching to logfile: '%s' (resuming from offset %d)", fileName, offset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the file has been truncated and the new offset is bigger than or equal to our saved offset?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to discuss about this

We don't have an accurate method to check if the file is truncated in this case. We are assuming that log_truncate_on_rotation = off So the file will only grow.

One idea is in mind is to store some of the latest bytes of the file and compare them with the newly updated file.

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use select current_setting('log_truncate_on_rotation'); on parser instantiation to know whether it's on or off

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, I missed it

If it's on
We are assuming that the file will only grow and there are no external delete or truncate operations, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's on
We are assuming that the file will only grow and there are no external delete or truncate operations, right?

If it's "off"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, off. My bad

Thanks for the guidance, working on it

Comment on lines 113 to 117
// Restore line count if we've seen this file before
if savedLines, ok := lp.fileOffsets[file]; ok {
linesRead = int(savedLines)
logger.Infof("Switching to logfile: %s (resuming from line %d)", file, linesRead)
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above + why isn't file truncation considered at all here?

Comment on lines 353 to 375
t.Run("save and restore offset", func(t *testing.T) {
lp := &LogParser{
fileOffsets: make(map[string]int64),
}

// Simulate saving offset for file1
lp.fileOffsets["postgresql-10.csv"] = 50000

// Verify offset is saved
assert.Equal(t, int64(50000), lp.fileOffsets["postgresql-10.csv"])

// Simulate saving offset for file2
lp.fileOffsets["postgresql-11.csv"] = 75000

// Verify both offsets are saved
assert.Equal(t, int64(50000), lp.fileOffsets["postgresql-10.csv"])
assert.Equal(t, int64(75000), lp.fileOffsets["postgresql-11.csv"])

// Simulate restoring offset when switching back to file1
restoredOffset, ok := lp.fileOffsets["postgresql-10.csv"]
assert.True(t, ok, "should find saved offset")
assert.Equal(t, int64(50000), restoredOffset)
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any useful stuff that this tests.

Comment on lines 416 to 448
t.Run("offset restoration with truncation check", func(t *testing.T) {
lp := &LogParser{
fileOffsets: make(map[string]int64),
}

// Simulate a file that was previously read up to offset 50000
lp.fileOffsets["postgresql-10.csv"] = 50000

// Simulate file was truncated (new size is smaller than saved offset)
newFileSize := int64(10000)
savedOffset, ok := lp.fileOffsets["postgresql-10.csv"]

if ok && savedOffset <= newFileSize {
// File not truncated, use saved offset
assert.Fail(t, "should not use saved offset when file is truncated")
} else {
// File was truncated or new, start from 0
// This is the expected behavior
assert.True(t, true, "correctly detected truncation")
}

// Simulate file grew (new size is larger than saved offset)
newFileSize = int64(60000)
savedOffset, ok = lp.fileOffsets["postgresql-10.csv"]

if ok && savedOffset <= newFileSize {
// File not truncated, use saved offset
assert.Equal(t, int64(50000), savedOffset)
} else {
assert.Fail(t, "should use saved offset when file has grown")
}
})
}
Copy link
Collaborator

@0xgouda 0xgouda Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert.True(t, true, "correctly detected truncation")
and
assert.Fail(t, "should use saved offset when file has grown")
don't indicate anything.

also you shouldn't use the ok && savedOffset <= newFileSize operator itself; you should invoke the real function and see how it behaves. There is no function from the actual code called here.

@ijaidev ijaidev force-pushed the fix/log-parser-offset-tracking-1089 branch from 274d37a to 39e5638 Compare January 21, 2026 14:25
…resql#1089

When using hourly log rotation with log_truncate_on_rotation=off,
the log parser was resetting the read offset to 0 when switching
back to previously-seen files, causing duplicate log entries to
be counted.

This fix adds in-memory offset tracking per log file:
- Added fileOffsets map to LogParser struct
- Save offset before switching files
- Restore offset when returning to a known file
- Prune map when exceeding 100 entries to prevent memory growth
- Added unit tests for offset tracking and pruning logic
@ijaidev ijaidev force-pushed the fix/log-parser-offset-tracking-1089 branch from 39e5638 to 52c8eac Compare January 22, 2026 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem in log parser logic

3 participants