Skip to content

sql: fix partial index data loss / phantom rows during update#166123

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:partialIndexDataLoss
Mar 20, 2026
Merged

sql: fix partial index data loss / phantom rows during update#166123
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:partialIndexDataLoss

Conversation

@fqazi
Copy link
Copy Markdown
Collaborator

@fqazi fqazi commented Mar 19, 2026

sql: fix partial index data loss / phantom rows during update

This commit fixes a bug on tables with multiple column families where a
concurrent update that does not overlap with a partial index's column
family could cause the partial index to write a NULL instead of the
actual data, or incorrectly add phantom rows to a temporary index
during a schema change backfill.

This bug was previously masked on default (single-column-family) tables
because an update to any column causes the optimizer to conservatively
fetch all columns in that family. However, with multiple column families,
two normalization rules in the optimizer caused issues:

  1. PruneMutationFetchCols: If an update does not change any column
    associated with an index, the optimizer avoids fetching those
    columns. This causes the execution layer to see NULLs for the
    unfetched columns.
  2. SimplifyPartialIndexProjections: If an update does not change
    any column associated with a partial index, the optimizer
    simplifies the partial index predicate evaluation to FALSE.
    This causes the execution layer to skip writes to the index.

During a schema change backfill, the execution layer's updater must
unconditionally write complete index entries to temporary (mutating)
indexes for any concurrent update, even if the index's columns are
unchanged. This ensures the backfill merger has a complete snapshot
to correctly reconcile the final index. If columns are pruned (Rule 1)
or writes are simplified away (Rule 2), the temporary index receives
incomplete entries (NULLs) or misses the update entirely.

Furthermore, missing columns can lead to phantom rows if the partial
index predicate evaluates to TRUE when given a NULL value (e.g.,
WHERE val IS NULL).

This change ensures the optimizer always fetches the required
columns and avoids simplifying predicate evaluation if the index
is a mutation index, correctly propagating the full row state to the
execution layer.

Fixes: #166122

Release note (bug fix): Fixed a bug where concurrent updates to a table
using multiple column families during a partial index creation could
result in data loss, incorrect NULL values, or validation failures in
the resulting index.

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Mar 19, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@fqazi fqazi force-pushed the partialIndexDataLoss branch from 244d225 to a4ed061 Compare March 19, 2026 12:31
@fqazi fqazi marked this pull request as ready for review March 19, 2026 12:33
@fqazi fqazi requested a review from a team as a code owner March 19, 2026 12:33
@fqazi fqazi requested review from yuzefovich and removed request for a team March 19, 2026 12:33
@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Mar 19, 2026

@themavik This seems unrelated and your commenting on the wrong PR.

@fqazi fqazi requested a review from a team March 19, 2026 12:35
@fqazi fqazi force-pushed the partialIndexDataLoss branch 2 times, most recently from b511415 to da7c234 Compare March 19, 2026 13:13
@rafiss rafiss requested a review from a team March 19, 2026 14:32
@michae2 michae2 self-requested a review March 19, 2026 16:58
Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice find!!

@michae2 reviewed 4 files and all commit messages, and made 4 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on fqazi and yuzefovich).


pkg/sql/opt/cat/index.go line 214 at r1 (raw file):

	// IsTemporaryIndexForBackfill returns true iff the index is an index being
	// used as the temporary index being used by an in-progress index backfill.

I just noticed that this comment is a little strange...


pkg/sql/opt/cat/index.go line 243 at r1 (raw file):

	}
	// Return if this is a temporary index.
	return table.Index(ord).IsTemporaryIndexForBackfill()

nit: I think this could be simplified to return IsMutationIndex(table, ord) && table.Index(ord).IsTemporaryIndexForBackfill()


pkg/sql/opt/norm/mutation_funcs.go line 100 at r1 (raw file):

		// required. Therefore, the partial index PUT and DEL columns cannot be
		// simplified.
		// If a secondary index is being built, then there will be points where

nit: insert a blank // line

This commit fixes a bug on tables with multiple column families where a
concurrent update that does not overlap with a partial index's column
family could cause the partial index to write a NULL instead of the
actual data, or incorrectly add phantom rows to a temporary index
during a schema change backfill.

This bug was previously masked on default (single-column-family) tables
because an update to any column causes the optimizer to conservatively
fetch all columns in that family. However, with multiple column families,
two normalization rules in the optimizer caused issues:

1. PruneMutationFetchCols: If an update does not change any column
   associated with an index, the optimizer avoids fetching those
   columns. This causes the execution layer to see NULLs for the
   unfetched columns.
2. SimplifyPartialIndexProjections: If an update does not change
   any column associated with a partial index, the optimizer
   simplifies the partial index predicate evaluation to FALSE.
   This causes the execution layer to skip writes to the index.

During a schema change backfill, the execution layer's updater must
unconditionally write complete index entries to temporary (mutating)
indexes for any concurrent update, even if the index's columns are
unchanged. This ensures the backfill merger has a complete snapshot
to correctly reconcile the final index. If columns are pruned (Rule 1)
or writes are simplified away (Rule 2), the temporary index receives
incomplete entries (NULLs) or misses the update entirely.

Furthermore, missing columns can lead to phantom rows if the partial
index predicate evaluates to TRUE when given a NULL value (e.g.,
WHERE val IS NULL).

This change ensures the optimizer always fetches the required
columns and avoids simplifying predicate evaluation if the index
is a mutation index, correctly propagating the full row state to the
execution layer.

Fixes: cockroachdb#166122

Release note (bug fix): Fixed a bug where concurrent updates to a table
using multiple column families during a partial index creation could
result in data loss, incorrect NULL values, or validation failures in
the resulting index.
@fqazi fqazi force-pushed the partialIndexDataLoss branch from da7c234 to e79071a Compare March 20, 2026 14:46
Copy link
Copy Markdown
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fqazi made 1 comment and resolved 3 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on michae2 and yuzefovich).


pkg/sql/opt/cat/index.go line 214 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

I just noticed that this comment is a little strange...

Done.

Let me tweak the comment here:

// IsTemporaryIndexForBackfill returns true if the index is a temporary index// for an in-progress index backfill.

@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Mar 20, 2026

@michae2 TFTR!

/trunk merge

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Mar 20, 2026

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


merge conflict cherry-picking e79071a to blathers/backport-release-24.3-166123

Backport to branch release-24.3 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Copy Markdown
Contributor

@mw5h mw5h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mw5h made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale).


pkg/sql/opt/cat/index.go line 214 at r1 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

Done.

Let me tweak the comment here:

// IsTemporaryIndexForBackfill returns true if the index is a temporary index// for an in-progress index backfill.

The 'iff' was intentional. It means "if and only if" to old people, but may have fallen out of popular use.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale).


pkg/sql/opt/cat/index.go line 214 at r1 (raw file):

Previously, mw5h (Matt White) wrote…

The 'iff' was intentional. It means "if and only if" to old people, but may have fallen out of popular use.

I do know and use 'iff' sometimes, I guess I'm old 😆

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michae2 made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale).


pkg/sql/opt/cat/index.go line 214 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I do know and use 'iff' sometimes, I guess I'm old 😆

Old if "iff" or "iff" if old? Or "iff" iff old?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-all Flags PRs that need to be backported to all supported release branches backport-failed target-release-26.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: partial index data loss and phantom rows during concurrent update

5 participants