Skip to content

Add early culling for lines fully left of the viewport.#1368

Open
b0nes164 wants to merge 7 commits intomainfrom
thomas/AAearlyCull
Open

Add early culling for lines fully left of the viewport.#1368
b0nes164 wants to merge 7 commits intomainfrom
thomas/AAearlyCull

Conversation

@b0nes164
Copy link
Copy Markdown
Member

@b0nes164 b0nes164 commented Jan 20, 2026

Apologies this took an embarrassingly long time due to a funny bug. Adds changes as discussed.

UPDATED:
Lines left of the viewport are not visible but still contribute winding, creating a left to right dependency for rendering. This patch simplifies rendering in this scenario by culling the left-of-viewport portion of lines, and recording their equivalent winding contribution inside a histogram. If a line is entirely left of the viewport, it is completely culled and produces no tiles.

Correspondingly strip.rs has been updated to consume this histogram. Notably, strip.rs now handles cases where no tiles are present, but culling occured, and cases where winding is "captive" either between left edge of the viewport and the first tile in the row, or between two non-consecutive rows.

Removes the previous handling of left of edge lines in strip.rs.

No culling, Ghostscript_Tiger shifted 50% off screen:

tile_aaa_shift50/Ghostscript_Tiger
                        time:   [51.574 µs 51.650 µs 51.732 µs]

With culling:

tile_aaa_shift50_cull/Ghostscript_Tiger
                        time:   [46.319 µs 46.380 µs 46.436 µs]

@b0nes164 b0nes164 requested a review from tomcur January 20, 2026 01:28
Copy link
Copy Markdown
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, and I think this is roughly what it should look like. I do have a few observations.

Early culling is conditionally enabled based on whether a culling opportunity was detected in the previous strip generation pass.

I'm not sure culling_opportunity are the best semantics. A path having had a culling opportunity may not be a good predictor of culling being possible the next time generate_with_clip is called, and same if the path didn't have a culling opportunity, the next path may in actuality well have one. These semantics will of course mean that we're always going to miss the performance benefit of culling at least during the first path in a sequence of cullable paths.

Perhaps, instead we should be conditional only on whether we want culling at all -- for example, if we're caching strips, perhaps all types of culling should be disabled (including culling where we don't account for winding, i.e., above, to the right, and below the viewport). In that case it may still make sense sense to have the dispatch be const-generic over the culling, as you have now, but perhaps the difference in performance can be so minimal that we don't have to bother.

E.g., for tiling, with the current implementation it doesn't seem to matter if we always cull or not:

tile/Ghostscript_Tiger  time:       [184.92 µs 185.14 µs 185.41 µs]
tile/Ghostscript_Tiger-cull time:   [181.96 µs 182.34 µs 182.84 µs]

That of course doesn't preclude dispatching to a strip rendering implementation that does or does not use culling, depending on whether we've culled something. That may well be important for performance.

Should also benchmark strip rendering, but that requires some bigger changes to the benchmark code...

It does seem both tiling and strip rendering are about 6% slower with these changes, but there is a little bit of noise so that has to be measured a bit more carefully (and this can perhaps be optimized more).

Currently this does not use the bitvec, and instead traverses the histogram.

We have to carefully check what the performance impact is of this where rendering a bunch of small geometry (like glyphs) which will usually just occupy a few rows.

Comment thread sparse_strips/vello_common/src/strip_generator.rs Outdated
Comment thread sparse_strips/vello_common/src/tile.rs Outdated
@b0nes164
Copy link
Copy Markdown
Member Author

b0nes164 commented Jan 28, 2026

A path having had a culling opportunity may not be a good predictor of culling being possible the next time generate_with_clip is called, and same if the path didn't have a culling opportunity, the next path may in actuality well have one.

These are good points! For more context, we made the decision to make the culling conditional on the previous make_tiles call because we were concerned about the cost of clearing the histograms. However, now that we have the data, the histogram cost seems to be negligible. But, I suspect the cost of traversing the rows inside strip.rs is not negligible.

So here's a thought: WDYT about culling always enabled in make_tile and then conditionally dispatching the culling logic in strips. I think that gives us the best of both? (and also only clearing the histograms if the previous make_tile culled?).

Bitvec to come!

@tomcur
Copy link
Copy Markdown
Member

tomcur commented Jan 28, 2026

So here's a thought: WDYT about culling always enabled in make_tile and then conditionally dispatching the culling logic in strips. I think that gives us the best of both? (and also only clearing the histograms if the previous make_tile culled?).

Yes! In fact, I edited my message (probably as you were typing) to say this:

That of course doesn't preclude dispatching to a strip rendering implementation that does or does not use culling, depending on whether we've culled something. That may well be important for performance.

@b0nes164
Copy link
Copy Markdown
Member Author

b0nes164 commented Feb 4, 2026

@LaurenzV do you know if it's safe to early cull when the storage generation mode is not GenerationMode::Replace ?

For safety, I make the assumption no, but after thinking about it, I think it's fine? Once the strips and alpha are generated (and cached), there's no need to hold onto the culled winding contributions (similarly to how tiles are also cleared regardless of the storage mode)?

@b0nes164
Copy link
Copy Markdown
Member Author

b0nes164 commented Feb 4, 2026

Otherwise, ready for review.

@LaurenzV
Copy link
Copy Markdown
Collaborator

LaurenzV commented Feb 4, 2026

I don't believe it should make a difference for the generation mode, as long as the strips are there it should be fine!

@b0nes164
Copy link
Copy Markdown
Member Author

b0nes164 commented Feb 4, 2026

Early culling is now always enabled. I've retained the USE_EARLY_CULL generic, as it's useful to manually enable/disable culling for tests and benchmarking.

@b0nes164 b0nes164 requested a review from tomcur February 5, 2026 22:07
@tomcur
Copy link
Copy Markdown
Member

tomcur commented Feb 11, 2026

I have not forgotten about this, but haven't had a chance for a proper look yet!

@b0nes164 b0nes164 force-pushed the thomas/AAearlyCull branch from 0333485 to eb1a069 Compare March 28, 2026 23:38
@b0nes164
Copy link
Copy Markdown
Member Author

PTAL!

Note, the behavior is different than what I had before. Now, left edge crossing lines are culled (if possible) inside tile.rs, and the corresponding logic in strip.rs has been removed.

Re moving culling earlier into flatten.rs: while possible, it would be extremely awkward, as the code has no concept of the tile grid at that point. From both a maintainability and a performance perspective, I think (potentially) pushing a couple extra lines is better than dragging a ton of logic to handle the histogramming in flatten.rs.

Re using clip instead of viewport rect: I will add that as a follow on CL.

@LaurenzV LaurenzV self-requested a review April 10, 2026 17:59
@b0nes164
Copy link
Copy Markdown
Member Author

Added more testing, some benchmarking, and fixed the issue identified with hayro.

@b0nes164 b0nes164 force-pushed the thomas/AAearlyCull branch from 1f4c083 to 17e6d54 Compare April 19, 2026 21:50
@LaurenzV
Copy link
Copy Markdown
Collaborator

PDF tests seem to pass now!

Copy link
Copy Markdown
Collaborator

@LaurenzV LaurenzV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to review strip.rs, but here some first comments!

Comment thread sparse_strips/vello_common/src/tile.rs Outdated
Comment thread sparse_strips/vello_common/src/tile.rs
Comment thread sparse_strips/vello_common/src/tile.rs Outdated
Comment thread sparse_strips/vello_common/src/strip_generator.rs Outdated
Comment thread sparse_strips/vello_common/src/strip_generator.rs Outdated
Comment thread sparse_strips/vello_common/src/tile.rs
Comment thread sparse_strips/vello_common/src/tile.rs
Comment thread sparse_strips/vello_sparse_tests/tests/basic.rs
Comment thread sparse_strips/vello_bench/src/tile.rs Outdated
Comment thread sparse_strips/vello_common/src/tile.rs Outdated
Comment thread sparse_strips/vello_common/src/strip.rs Outdated
Comment thread sparse_strips/vello_sparse_tests/tests/basic.rs
Comment thread sparse_strips/vello_common/src/strip.rs Outdated
Comment thread sparse_strips/vello_common/src/strip.rs Outdated
Comment thread sparse_strips/vello_common/src/strip.rs
Comment thread sparse_strips/vello_common/src/strip.rs Outdated
Comment thread sparse_strips/vello_common/src/strip.rs Outdated
Comment thread sparse_strips/vello_common/src/strip.rs Outdated
alpha_buf.len() as u32,
false,
));
alpha_buf.extend([255_u8; Tile::HEIGHT as usize * Tile::WIDTH as usize]);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if those are necessary (same for above), but not for you to figure out. For now, that's definitely the better choice, unsure if there is code that assumes strips have at least a width of 4.

Copy link
Copy Markdown
Member Author

@b0nes164 b0nes164 May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about other downstream affects, but they are necessary for at least clipping. Without injecting in the alphas, this:

#[inline(always)]
fn cur_strip_width(&self) -> u16 {
    let cur = self.cur_strip();
    let next = self.next_strip();
    ((next.alpha_idx() - cur.alpha_idx()) / Tile::HEIGHT as u32) as u16
}

returns 0, resulting in a 0 width fill region, so downstream in:

fn overlap_relationship(&self, other: &Region<'_>) -> OverlapRelationship {
        if self.end() <= other.start() {
            OverlapRelationship::Advance(Advance::Left)
        } else if self.start() >= other.end() {
            OverlapRelationship::Advance(Advance::Right)
        } else {
           . . .
        }
    }

if you have a clip that also starts at the left edge (also possibly culled?), you get:

if self.end() <= other.start() {  // 0 <= 0 (Evaluates to TRUE)
    OverlapRelationship::Advance(Advance::Left)
}

Resulting in no overlap

Comment thread sparse_strips/vello_common/src/strip.rs Outdated
Comment thread sparse_strips/vello_common/src/strip.rs
@b0nes164
Copy link
Copy Markdown
Member Author

b0nes164 commented May 4, 2026

PTAL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants