Skip to content

Fix: MemTable LIMIT ignored with reordered projections#21177

Open
RamakrishnaChilaka wants to merge 1 commit intoapache:mainfrom
RamakrishnaChilaka:fix_limit_column_ordering_bug
Open

Fix: MemTable LIMIT ignored with reordered projections#21177
RamakrishnaChilaka wants to merge 1 commit intoapache:mainfrom
RamakrishnaChilaka:fix_limit_column_ordering_bug

Conversation

@RamakrishnaChilaka
Copy link

Which issue does this PR close?

Rationale for this change

When SELECT projects columns in a different order than the table schema (e.g., SELECT col_c, col_b vs schema order col_a, col_b, col_c), the LIMIT clause is silently ignored — all rows are returned instead of the requested limit.

The physical optimizer runs LimitPushdown (which correctly sets fetch=N on MemorySourceConfig) before ProjectionPushdown. When the projection is pushed down, try_swapping_with_projection() creates a new MemorySourceConfig via try_new(), which resets fetch to None, silently dropping the limit.

What changes are included in this PR?

One-line fix in MemorySourceConfig::try_swapping_with_projection to preserve self.fetch on the newly created config:

// Before:
.map(|s| Arc::new(s) as Arc<dyn DataSource>)
// After:
.map(|s| Arc::new(s.with_limit(self.fetch)) as Arc<dyn DataSource>)

Are these changes tested?

Yes.

  • Unit test (try_swapping_with_projection_preserves_fetch): directly verifies fetch is preserved after projection pushdown. Confirmed it fails without the fix (None vs Some(5)).
  • SQL logic test (limit.slt, table t21176): end-to-end regression test for SELECT col_c, col_b FROM t LIMIT 5 with reverse column order.

Are there any user-facing changes?

No API changes. This is a correctness fix — LIMIT now works correctly regardless of SELECT column order against in-memory tables.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels Mar 26, 2026
Copy link
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix!

This fix is fine, although I find the overall code structure here pretty fragile. At present, we will lose sort_information and show_sizes in this scenario (not sure offhand if that is problematic today), and we could potentially lose more metadata in the future.

@neilconway
Copy link
Contributor

@alamb @comphead Candidate for backporting, I think.

@RamakrishnaChilaka RamakrishnaChilaka force-pushed the fix_limit_column_ordering_bug branch from 699092a to 95613af Compare March 27, 2026 05:17
@RamakrishnaChilaka RamakrishnaChilaka force-pushed the fix_limit_column_ordering_bug branch from 95613af to c63fe9d Compare March 27, 2026 05:27
@RamakrishnaChilaka
Copy link
Author

Thanks for this fix!

This fix is fine, although I find the overall code structure here pretty fragile. At present, we will lose sort_information and show_sizes in this scenario (not sure offhand if that is problematic today), and we could potentially lose more metadata in the future.

Thank you for the review @neilconway!

I've updated the approach to clone self and only mutate the projection-related fields, so all metadata (fetch, sort_information, show_sizes, and any future fields) is preserved automatically. Sort information is re-projected via project_orderings() to remap column indices to the new schema (the old code was dropping it entirely via try_new()). This should make it future-proof against new fields being added to MemorySourceConfig. Please let me know your thoughts on this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MemTable LIMIT silently ignored when SELECT projects columns in different order than table schema

3 participants