Skip to content

Use Scan API#6391

Draft
gatesn wants to merge 47 commits intodevelopfrom
ngates/scan-api
Draft

Use Scan API#6391
gatesn wants to merge 47 commits intodevelopfrom
ngates/scan-api

Conversation

@gatesn
Copy link
Contributor

@gatesn gatesn commented Feb 10, 2026

Experiment to use the Scan API from DuckDB and DataFusion integrations.

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 10, 2026
@github-actions github-actions bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 10, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

Benchmarks: PolarSignals Profiling

Summary

  • Overall: 11.394x ❌
  • Vortex: 11.394x ❌
  • datafusion:vortex: 11.394x ❌
  • Best: No improvements
  • Worst: polarsignals_q02/datafusion:vortex-file-compressed (247.639x)
  • Significant (>10%): 0↑ 10↓
Detailed Results Table
name PR 68ec238 base 6f10740 ratio (PR/base) unit remark
polarsignals_q00/datafusion:vortex-file-compressed 7206036834 1.89675e+08 37.9914 ns 🚨
polarsignals_q01/datafusion:vortex-file-compressed 7078789543 4.04512e+08 17.4996 ns 🚨
polarsignals_q02/datafusion:vortex-file-compressed 7169614491 2.89518e+07 247.639 ns 🚨
polarsignals_q03/datafusion:vortex-file-compressed 7395335053 4.21262e+08 17.5552 ns 🚨
polarsignals_q04/datafusion:vortex-file-compressed 22911601 1.40448e+07 1.63132 ns 🚨
polarsignals_q05/datafusion:vortex-file-compressed 115767005 1.83461e+07 6.31018 ns 🚨
polarsignals_q06/datafusion:vortex-file-compressed 123063659 2.33451e+07 5.27149 ns 🚨
polarsignals_q07/datafusion:vortex-file-compressed 113426101 1.69626e+07 6.68683 ns 🚨
polarsignals_q08/datafusion:vortex-file-compressed 8282037599 4.98161e+08 16.6252 ns 🚨
polarsignals_q09/datafusion:vortex-file-compressed 34199619 1.61671e+07 2.11539 ns 🚨

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=1 on NVME failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark FineWeb NVMe failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=1 on S3 failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-DS SF=1 on NVME failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=10 on NVME failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark FineWeb S3 failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Statistical and Population Genetics failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark TPC-H SF=10 on S3 failed! Check the workflow run for details.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨

Benchmark Clickbench on NVME failed! Check the workflow run for details.

Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 10, 2026
@github-actions github-actions bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 10, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 10, 2026
@github-actions github-actions bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 10, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label Feb 11, 2026
@github-actions github-actions bot removed the action/benchmark Trigger full benchmarks to run on this PR label Feb 11, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Polar Signals Profiling Results

Latest Run

Status Commit Job Attempt Link
🟢 Done 68ec238 1 Explore Profiling Data
Previous Runs (2)
Status Commit Job Attempt Link
🟢 Done 7aaf8b6 1 Explore Profiling Data
🟢 Done d05997f 1 Explore Profiling Data

Powered by Polar Signals Cloud

@github-actions
Copy link
Contributor

Benchmarks: Random Access

Summary

  • Overall: 1.020x ➖
  • Vortex: 1.029x ➖
  • Best: No improvements
  • Worst: random-access/vortex-tokio-local-disk (1.029x)
  • Significant (>10%): 0↑ 0↓
Detailed Results Table
name PR d05997f base 8e92de5 ratio (PR/base) unit remark
random-access/parquet-tokio-local-disk 188727762 1.86617e+08 1.01131 ns
random-access/vortex-tokio-local-disk 1370907 1.33276e+06 1.02862 ns

if: matrix.remote_storage == null || github.event.pull_request.head.repo.fork == true
shell: bash
env:
VORTEX_USE_SCAN_API: "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be 0 or remove the old one?

@gatesn gatesn added action/benchmark-sql Trigger SQL benchmarks to run on this PR and removed action/benchmark-sql Trigger SQL benchmarks to run on this PR labels Feb 13, 2026
@github-actions github-actions bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 13, 2026
Comment on lines 326 to 381
loop {
// Try to pull from the current child's split stream.
if let Some(ref mut child_stream) = current_stream {
match child_stream.next().await {
Some(Ok(split)) => {
if let Some(ref mut s) = state
&& let Some(ref mut limit) = s.remaining_limit
{
let est = split.row_count_estimate();
*limit = limit.saturating_sub(est.upper.unwrap_or(est.lower));
}
return Some((Ok(split), (state, current_stream)));
}
Some(Err(e)) => {
return Some((Err(e), (None, None)));
}
None => {
// Current child exhausted, move to next.
drop(current_stream.take());
}
}
}

let s = state.as_mut()?;

if s.remaining_limit.is_some_and(|l| l == 0) {
return None;
}

// Get the next data source.
let source = match s.next_source().await {
Ok(Some(source)) => source,
Ok(None) => return None,
Err(e) => return Some((Err(e), (None, None))),
};

if source.dtype() != &s.dtype {
return Some((
Err(vortex_err!(
"MultiDataSource dtype mismatch: expected {}, got {}",
s.dtype,
source.dtype()
)),
(None, None),
));
}

let mut child_request = s.request.clone();
child_request.limit = s.remaining_limit;
let child_scan = match source.scan(child_request) {
Ok(scan) => scan,
Err(e) => return Some((Err(e), (None, None))),
};

current_stream = Some(child_scan.splits());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be extracted into a helper method that takes &mut current_stream, then you can do:

impl DataSourceScan for MultiDataSourceScan {
    fn splits(self: Box<Self>) -> SplitStream {
        stream::unfold(
            (*self, None::<SplitStream>),
            |(mut scan, mut current_stream)| async move {
                let result = scan.next_split(&mut current_stream).await?;
                Some((result, (scan, current_stream)))
            },
        )
        .boxed()
    }
}

Doesn't seem like you need it to be Some(*self)

Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added action/benchmark-sql Trigger SQL benchmarks to run on this PR and removed action/benchmark-sql Trigger SQL benchmarks to run on this PR labels Feb 15, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 15, 2026
@github-actions github-actions bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 15, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 15, 2026
@github-actions github-actions bot removed the action/benchmark-sql Trigger SQL benchmarks to run on this PR label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants