Skip to content

fix: Conflict during Batch Upsert - BED-7483#50

Open
kpowderly wants to merge 3 commits intomainfrom
BED-7483
Open

fix: Conflict during Batch Upsert - BED-7483#50
kpowderly wants to merge 3 commits intomainfrom
BED-7483

Conversation

@kpowderly
Copy link
Copy Markdown

@kpowderly kpowderly commented Mar 26, 2026

Bloodhound is experiencing errors during ingest as such: {"time":"2026-02-20T00:36:36.016146471Z","level":"ERROR","message":"Error reading ingest file /opt/bhe/work/tmp/bhe641820275: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time (SQLSTATE 21000)"}

More error logs / details can be found in the ticket: https://specterops.atlassian.net/browse/BED-7483

Steps I took to come up with this cause/fix

What I believe to be the root cause of the error:

  • A node or edge is created and added to the batch buffer
  • Before the buffer flushes to Postgres, something changes the unique identifier (objectid) in memory
  • Meanwhile, another node or edge is added to the buffer when the same unique identifier (objectid)
  • The buffer then flushes and postgres sees a single statement with two rows sharing the same ID, returning the error

Proposed fix

  • Shallow-clone the properties map at add time so the batch holds an independent copy. Without this, later mutations to the original node's properties can change the stored identity key, causing two entries to serialize with the same key and conflict at flush time.

I would appreciate feedback on this fix. Being that it's flaky, it's pretty difficult to reproduce. I am torn on returning rows.Err() because I'm not sure if these errors are being swallowed for a specific reason.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Resolved an issue in the PostgreSQL driver where node data could be inadvertently shared between batch operations.
  • Tests

    • Added test coverage for node batch update behavior, including identity property handling.

@kpowderly kpowderly self-assigned this Mar 26, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 26, 2026

Walkthrough

The changes add defensive copying to NodeUpdateBatch.Add to prevent mutations of caller-provided nodes and introduce comprehensive tests validating batch behavior when node identity properties change at different timing points during batch operations.

Changes

Cohort / File(s) Summary
Defensive Copying Implementation
drivers/pg/query/format.go
Modified NodeUpdateBatch.Add to create a defensive copy of incoming update.Node before applying kind mutations, preventing aliasing with the caller's node object.
Batch Behavior Validation
drivers/pg/query/format_test.go
Added TestNodeUpdateBatch with two subtests exercising batch collapsing behavior when node identity properties (objectid) change at different timing points relative to batch operations and validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A copy made, defensive and bright,
Protects the caller's nodes from plight—
No sharing, no sneaky mutation's blight,
Tests verify the batch collects right,
When identity shifts, all batches unites! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing a batch upsert conflict issue (BED-7483) by implementing defensive copying of node properties to prevent aliasing issues.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch BED-7483

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kpowderly kpowderly changed the title fix: Properties BED-7483 fix: Conflict during Batch Upsert BED-7483 Mar 26, 2026
@kpowderly kpowderly changed the title fix: Conflict during Batch Upsert BED-7483 fix: Conflict during Batch Upsert - BED-7483 Mar 26, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
drivers/pg/query/format.go (1)

216-216: Consider clarifying the self-referential AddKinds call.

This pattern—calling batchNode.AddKinds(batchNode.Kinds...)—is semantically unusual: it re-adds existing kinds to populate AddedKinds. The intent (snapshotting kinds into the delta-tracking fields) isn't immediately obvious. A brief inline comment would help future readers.

 batchNode := graph.NewNode(update.Node.ID, update.Node.Properties.Clone(), update.Node.Kinds...)
+// Snapshot existing kinds into AddedKinds so they're tracked for the batch operation.
 batchNode.AddKinds(batchNode.Kinds...)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drivers/pg/query/format.go` at line 216, The call
batchNode.AddKinds(batchNode.Kinds...) is re-appending the node's existing Kind
list to populate its delta/added-tracking fields and should be clarified; add a
concise inline comment next to the call (near the invocation of AddKinds in
format.go) stating that this self-referential call is intentional and used to
snapshot/copy the current batchNode.Kinds into the node's AddedKinds (or
delta-tracking) so readers understand it isn’t a bug but a deliberate population
step.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@drivers/pg/query/format.go`:
- Line 216: The call batchNode.AddKinds(batchNode.Kinds...) is re-appending the
node's existing Kind list to populate its delta/added-tracking fields and should
be clarified; add a concise inline comment next to the call (near the invocation
of AddKinds in format.go) stating that this self-referential call is intentional
and used to snapshot/copy the current batchNode.Kinds into the node's AddedKinds
(or delta-tracking) so readers understand it isn’t a bug but a deliberate
population step.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9b51d05f-d0c0-4261-9ac9-e36905c45bab

📥 Commits

Reviewing files that changed from the base of the PR and between ec6ffe0 and 6b920d7.

📒 Files selected for processing (2)
  • drivers/pg/query/format.go
  • drivers/pg/query/format_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant