Skip to content

Commit a26fca7

Browse files
authored
Remove sortedcontainers dependency (#2947)
closes #2945 # Rationale for this change This PR removes the `SortedContainers` dependency. Looking at the behavior of sorted containers we can simplify the logic for merging manigests and collecting the results while maintaining identical behavior. **What the logic today was doing:** 1. Submit all manifest merge tasks to thread pool (pallelism starts with executor) 2. Collect futures as they complete using `as_completed()` which is out of order 3. Store completed futures in a `SortedList` to maintain order by submission 4. Extract all results from the sorted futures 5. Flatten and return **What we do now:** 1. Submit all manifest merge tasks to thread pool (pallelism starts with executor) 2. Iterate through futures in submission order, calling `.result()` on each 3. Flatten and return This shows we must collect the results before the next step. So we can iterate futures directly and call `.result()` in order. This blocks the main thread until each future completes, but doesn't block worker threads and they all continue running in parallel. ## Are these changes tested? All existing tests pass. ## Are there any user-facing changes? No
1 parent 11a2281 commit a26fca7

File tree

3 files changed

+1
-24
lines changed

3 files changed

+1
-24
lines changed

pyiceberg/table/update/snapshot.py

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,15 @@
1616
# under the License.
1717
from __future__ import annotations
1818

19-
import concurrent.futures
2019
import itertools
2120
import uuid
2221
from abc import abstractmethod
2322
from collections import defaultdict
2423
from collections.abc import Callable
25-
from concurrent.futures import Future
2624
from datetime import datetime
2725
from functools import cached_property
2826
from typing import TYPE_CHECKING, Generic
2927

30-
from sortedcontainers import SortedList
31-
3228
from pyiceberg.avro.codecs import AvroCompressionCodec
3329
from pyiceberg.expressions import (
3430
AlwaysFalse,
@@ -792,14 +788,7 @@ def merge_bin(manifest_bin: list[ManifestFile]) -> list[ManifestFile]:
792788

793789
executor = ExecutorFactory.get_or_create()
794790
futures = [executor.submit(merge_bin, b) for b in bins]
795-
796-
# for consistent ordering, we need to maintain future order
797-
futures_index = {f: i for i, f in enumerate(futures)}
798-
completed_futures: SortedList[Future[list[ManifestFile]]] = SortedList(iterable=[], key=lambda f: futures_index[f])
799-
for future in concurrent.futures.as_completed(futures):
800-
completed_futures.add(future)
801-
802-
bin_results: list[list[ManifestFile]] = [f.result() for f in completed_futures if f.result()]
791+
bin_results: list[list[ManifestFile]] = [r for f in futures if (r := f.result())]
803792

804793
return [manifest for bin_result in bin_results for manifest in bin_result]
805794

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ dependencies = [
3737
"rich>=10.11.0,<15.0.0",
3838
"strictyaml>=1.7.0,<2.0.0", # CVE-2020-14343 was fixed in 5.4.
3939
"pydantic>=2.0,<3.0,!=2.4.0,!=2.4.1,!=2.12.0,!=2.12.1", # 2.4.0, 2.4.1, 2.12.0, 2.12.1 has a critical bug
40-
"sortedcontainers==2.4.0",
4140
"fsspec>=2023.1.0",
4241
"pyparsing>=3.1.0,<4.0.0",
4342
"tenacity>=8.2.3,<10.0.0",

uv.lock

Lines changed: 0 additions & 11 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)