[GH-2926] Add ST_BoxIntersects and ST_BoxContains for Box2D#2932
[GH-2926] Add ST_BoxIntersects and ST_BoxContains for Box2D#2932jiayuasu merged 2 commits intoapache:masterfrom
Conversation
Two planar bbox predicates on Box2D arguments: ST_BoxIntersects(a: Box2D, b: Box2D) -> Boolean ST_BoxContains(a: Box2D, b: Box2D) -> Boolean Both use closed intervals, matching PostGIS && and ~ operators on box2d. NULL on null input. JVM, Python, Flink wrappers all updated. New case classes in spark-common Predicates.scala registered in Catalog. Scala DataFrame API wrappers in st_predicates. Flink ScalarFunction subclasses registered in the Flink Catalog. Closes apache#2926.
There was a problem hiding this comment.
Pull request overview
This PR adds two Box2D-based planar bounding-box predicates (ST_BoxIntersects, ST_BoxContains) across Sedona’s common layer and exposes them consistently via Spark SQL/DataFrame APIs, PySpark wrappers, and Flink Table API, with corresponding cross-language tests.
Changes:
- Introduces
Predicates.boxIntersects(Box2D, Box2D)andPredicates.boxContains(Box2D, Box2D)incommon. - Adds Spark SQL expressions + Catalog registration + Scala DataFrame API wrappers and PySpark wrappers.
- Adds Flink ScalarFunctions + Catalog registration and test coverage across common/Spark/Flink/Python.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| common/src/main/java/org/apache/sedona/common/Predicates.java | Adds core Box2D predicate implementations used by higher-level APIs. |
| common/src/test/java/org/apache/sedona/common/PredicatesTest.java | Adds unit tests for the new common-layer Box2D predicates. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Predicates.scala | Adds Spark Catalyst expressions for the new Box2D predicates. |
| spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala | Registers the new predicates as Spark SQL functions. |
| spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_predicates.scala | Adds Scala DataFrame API wrappers for the new predicates. |
| spark/common/src/test/scala/org/apache/sedona/sql/predicateTestScala.scala | Adds Spark SQL-level tests (including NULL propagation) for the new predicates. |
| python/sedona/spark/sql/st_predicates.py | Adds PySpark wrapper functions for ST_BoxIntersects / ST_BoxContains. |
| python/tests/sql/test_predicate.py | Adds PySpark SQL test coverage for the new predicates. |
| flink/src/main/java/org/apache/sedona/flink/expressions/Predicates.java | Adds Flink ScalarFunctions for the new Box2D predicates. |
| flink/src/main/java/org/apache/sedona/flink/Catalog.java | Registers the new Flink predicate functions. |
| flink/src/test/java/org/apache/sedona/flink/PredicateTest.java | Adds Flink Table API tests for the new predicates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public static boolean boxIntersects(Box2D a, Box2D b) { | ||
| return !(a.getXMax() < b.getXMin() | ||
| || a.getXMin() > b.getXMax() | ||
| || a.getYMax() < b.getYMin() | ||
| || a.getYMin() > b.getYMax()); |
There was a problem hiding this comment.
Done in 6726b1b — went with option (a). boxIntersects/boxContains now call requireOrderedPlanarBox and throw IllegalArgumentException with a clear message on inverted bounds. Defining wraparound semantics on a planar predicate without the geography work in #2929 would be premature; failing fast is the honest contract for now.
| public static boolean boxContains(Box2D a, Box2D b) { | ||
| return a.getXMin() <= b.getXMin() | ||
| && a.getYMin() <= b.getYMin() | ||
| && a.getXMax() >= b.getXMax() | ||
| && a.getYMax() >= b.getYMax(); |
There was a problem hiding this comment.
Same fix in 6726b1b — boxContains now validates ordered bounds before evaluating containment.
| * Closed-interval bbox intersection: true if {@code a} and {@code b} share any point on either | ||
| * axis (matches PostGIS {@code &&} on box2d). Either argument being null returns null at the SQL | ||
| * layer; this Java entry point throws {@link NullPointerException} on null input. |
There was a problem hiding this comment.
Reworded in 6726b1b: now says overlap on both the X and Y axes.
| @Test | ||
| public void testBoxIntersects() { | ||
| Box2D a = new Box2D(0.0, 0.0, 5.0, 5.0); | ||
|
|
||
| // Full overlap | ||
| assertTrue(Predicates.boxIntersects(a, new Box2D(1.0, 1.0, 2.0, 2.0))); | ||
| // Partial overlap | ||
| assertTrue(Predicates.boxIntersects(a, new Box2D(3.0, 3.0, 7.0, 7.0))); | ||
| // Edge-touching (closed intervals) | ||
| assertTrue(Predicates.boxIntersects(a, new Box2D(5.0, 0.0, 10.0, 5.0))); | ||
| // Corner-touching (closed intervals) | ||
| assertTrue(Predicates.boxIntersects(a, new Box2D(5.0, 5.0, 10.0, 10.0))); | ||
| // Disjoint on X | ||
| assertFalse(Predicates.boxIntersects(a, new Box2D(6.0, 0.0, 10.0, 5.0))); | ||
| // Disjoint on Y | ||
| assertFalse(Predicates.boxIntersects(a, new Box2D(0.0, 6.0, 5.0, 10.0))); | ||
| } |
There was a problem hiding this comment.
Added testBoxPredicatesRejectInvertedBounds in 6726b1b covering both inverted-X (the antimeridian-style case) and inverted-Y, asserting the documented IllegalArgumentException is thrown.
| /** | ||
| * Test if leftGeometry full intersects rightGeometry. Supports both Geometry (JTS) and Geography | ||
| * (S2) inputs via InferredExpression dual dispatch. | ||
| * | ||
| * @param inputExpressions | ||
| */ | ||
| private[apache] case class ST_BoxIntersects(inputExpressions: Seq[Expression]) | ||
| extends InferredExpression(Predicates.boxIntersects _) { | ||
|
|
There was a problem hiding this comment.
Fixed in 6726b1b — replaced the bleed-through Scaladoc above ST_BoxIntersects with proper Box2D semantics, added a parallel doc on ST_BoxContains, and restored the original ST_Intersects Scaladoc below.
| package org.apache.spark.sql.sedona_sql.expressions | ||
|
|
||
| import org.apache.sedona.common.Predicates | ||
| import org.apache.sedona.common.geometryObjects.Box2D |
There was a problem hiding this comment.
Removed in 6726b1b — Box2D was used implicitly via Predicates.boxIntersects _ eta-expansion, no explicit reference needed in this file.
- Validate ordered planar bounds in boxIntersects/boxContains and throw IllegalArgumentException on inverted (xmin>xmax / ymin>ymax) input. Box2D allows inverted bounds (reserved for future antimeridian wraparound semantics), but planar predicates have no defined meaning for inverted intervals; failing fast beats silently misleading output. - Add testBoxPredicatesRejectInvertedBounds covering both inverted X and inverted Y, asserting the documented exception. - Tighten Javadocs to say overlap on "both" axes (not "either"). - Replace the stale ST_Intersects-bleed Scaladoc above ST_BoxIntersects with proper Box2D semantics; add a parallel doc block on ST_BoxContains; restore the Scaladoc on ST_Intersects. - Remove unused Box2D import from Predicates.scala.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes Add Box2D predicates: ST_BoxIntersects, ST_BoxContains #2926What changes were proposed in this PR?
Adds two planar bbox predicates that operate on
Box2Darguments:ST_BoxIntersects(a: Box2D, b: Box2D) -> Boolean— true if the two bboxes share any point on either axis. Mirrors PostGIS&&onbox2d.ST_BoxContains(a: Box2D, b: Box2D) -> Boolean— true ifafully containsb. Mirrors PostGIS~onbox2d.Both use closed intervals, matching PostGIS semantics — edge-touching and corner-touching count as intersection; equal boxes contain each other. NULL on null input.
Where the changes land
common/.../Predicates.javaboxIntersects(Box2D, Box2D)andboxContains(Box2D, Box2D)spark/common/.../expressions/Predicates.scalaST_BoxIntersects/ST_BoxContainscase classesspark/common/.../UDF/Catalog.scalapredicateExprsspark/common/.../expressions/st_predicates.scalapython/sedona/spark/sql/st_predicates.pyflink/.../expressions/Predicates.javaflink/.../Catalog.javaHow was this patch tested?
common/.../PredicatesTest.java—testBoxIntersects(full overlap, partial overlap, edge-touching, corner-touching, disjoint X, disjoint Y) andtestBoxContains(inside, equal, smaller-inside, outside, crosses-boundary).spark/common/.../predicateTestScala.scala— "Passed ST_BoxIntersects and ST_BoxContains" — full SQL-level coverage of both predicates including NULL propagation.flink/.../PredicateTest.java—testBoxIntersectsandtestBoxContainsthrough Flink Table API.python/tests/sql/test_predicate.py—test_st_box_intersects_and_containsthrough PySpark SQL.What's not in scope
ST_Box2D(geom)first.Box3D(Add Box3D type and 3D bbox SQL surface #2928).Did this PR include necessary documentation updates?