[spark] Add scan.max.records.per.partition config to split log table input partitions by Yohahaha · Pull Request #3260 · apache/fluss

Yohahaha · 2026-05-07T02:52:22Z

Purpose

Linked issue: close #3215

Brief change log

Introduce scan.max.records.per.partition config option for Spark log table reads. When set, each Fluss
bucket whose offset range exceeds this value will be split into multiple Spark input partitions, improving
read parallelism for large offset ranges.
Update BucketOffsetsRetrieverImpl to support fetching real earliest offsets when needed.

Tests

SparkLogTableReadTest: "Spark Read: split partition by config"

API and Format

Documentation

Yohahaha · 2026-05-07T02:56:02Z

@YannByron

Yohahaha · 2026-05-07T06:14:58Z

@luoyuxia @fresh-borzoni PTAL!

split scan partition by conf

75e67cc

Yohahaha marked this pull request as ready for review May 7, 2026 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Add scan.max.records.per.partition config to split log table input partitions#3260

[spark] Add scan.max.records.per.partition config to split log table input partitions#3260
Yohahaha wants to merge 1 commit intoapache:mainfrom
Yohahaha:spark-split-partition

Yohahaha commented May 7, 2026

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yohahaha commented May 7, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

Yohahaha commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant