[python] Add caching infrastructure and utilities#7347
Open
tub wants to merge 7 commits intoapache:masterfrom
Open
[python] Add caching infrastructure and utilities#7347tub wants to merge 7 commits intoapache:masterfrom
tub wants to merge 7 commits intoapache:masterfrom
Conversation
- Add backtick quoting to Identifier for SQL-safe formatting - Add ChangelogProducer enum to core_options - Add exists_batch() for bulk file existence checks - Add LRU caching to ManifestFileManager and ManifestListManager - Add snapshot caching and traversal helpers to SnapshotManager - Add cachetools dependency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2 tasks
This was referenced Mar 5, 2026
…tEquals Extract shared base class for ManifestFileCacheTest and ManifestListCacheTest, add _make_snapshot() helper, and fix deprecated assertEquals (removed in 3.12). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
commented
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…efault-size tests - Move shared cache-behaviour tests (second_read, disabled_when_zero) into _CacheBehaviourMixin so they run for both manager types without duplication - Extract _EMPTY_ROW / _EMPTY_STATS module constants to reduce DataFileMeta boilerplate - Remove test_default_cache_size tests (just assert constructor defaults) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cachetools 7.x requires Python >=3.10 but the project supports 3.6+. Drop info=True and explicit key= from @cachedmethod (both 7.x-only features) while keeping the decorator itself (available since 4.x). Replace cache_info()-based test assertions with unittest.mock spies on file_io.new_input_stream, testing the actual caching effect without any production code counters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
JingsongLi
reviewed
Mar 7, 2026
|
|
||
| return results | ||
|
|
||
| def find_next_scannable( |
Contributor
There was a problem hiding this comment.
Do we really need it? Just reading the snapshot file to determine, does this really need to be optimized?
Contributor
|
Can you explain the specific function of Cache? It seems that streaming reading does not repeat reading files? |
JingsongLi
reviewed
Mar 7, 2026
| pyarrow>=6,<7; python_version < "3.8" | ||
| pyarrow>=16,<20; python_version >= "3.8" | ||
| pylance>=0.20,<1; python_version>="3.9" | ||
| pylance>=0.10,<1; python_version>="3.8" and python_version<"3.9" |
Contributor
There was a problem hiding this comment.
I did a small commit to remove these deps. Please rebase master.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Identifierfor SQL-safe formattingChangelogProducerenum to core_optionsexists_batch()for bulk file existence checksManifestFileManagerandManifestListManagerSnapshotManagercachetoolsdependencyStacked PR series
This is PR 1a/5 in the Python streaming read series:
[python] Add caching infrastructure and utilities #7347 — caching infrastructure + utilities
[python] Add scanners, sharding, and row kind support #7348 — scanners, sharding, row kind
[python] Add consumer management for streaming progress #7349 — consumer management
[python] Add StreamReadBuilder and AsyncStreamingTableScan #7350 — StreamReadBuilder + AsyncStreamingTableScan
[python] Add paimon tail CLI for streaming table reads #7351 —
paimon tailCLITest plan
flake8passes on all changed filespython -m pytestpasses (630/630, 9 pre-existing lance skips)identifier_test.py,manifest_cache_test.py