-
-
Notifications
You must be signed in to change notification settings - Fork 596
Description
Join planning uses the StatisticsTable::RowCount method to compute row estimates for tables. This doesn't work with keyless tables, because Dolt represents keyless tables with a single internal row for each distinct value, and RowCount looks at the underlying index. This means that join planning will see the number of distinct rows in a keyless table, not the total number of rows.
GMS's in-memory implementation for tables doesn't have this problem, because it stores a separate entry for each row. But this creates a problem for testing: we run the same plan tests against both Dolt and GMS, and since the two implementations return different row estimates, it's not possible to write expected estimates that satisfy both.
So our options are:
- Not write plan tests with an
ExpectedEstimatefield if the expected plan contains a join on a keyless table. - All plan tests on keyless tables must execute `analyze table ... update histogram on (...) using data '{"row_count": ...}'; so that the estimates match on both GMS and Dolt
- Compute the actual row count of keyless tables on demand in Dolt, which requires iterating over the table during planning.
- Have keyless tables precompute and cache their total row count.
4 seems like the best solution, except that it changes the storage format in a way that older Dolt clients won't be able to access tables that have this row count information. (Since if an old client could update these tables, it wouldn't know to update the row count, and the row count will become inaccurate.)
For now I've removed ExpectedEstimate from the only test currently affected by this, and documenting this issue.