Switch internal hashmap layout to be more efficient#2709
Open
t-b wants to merge 7 commits into
Open
Conversation
Created with diff --git a/Packages/tests/Basic/UTF_Hashmap.ipf b/Packages/tests/Basic/UTF_Hashmap.ipf index b4be5bc..fbe056292c 100644 --- a/Packages/tests/Basic/UTF_Hashmap.ipf +++ b/Packages/tests/Basic/UTF_Hashmap.ipf @@ -340,6 +340,9 @@ static Function DeleteEntryWorks() value = "mnop" HM_AddEntry(hashmap, key, str = value) + WAVE hashmap_text_version0 = DeepCopyWaveRefWave(hashmap) + Duplicate/O hashmap_text_version0, root:hashmap_text_version0 + WAVE/Z results = GetFilledEntries(hashmap) CHECK_WAVE(results, NUMERIC_WAVE) CHECK_EQUAL_WAVES(results, {6100}, mode = WAVE_DATA) @@ -577,6 +580,9 @@ static Function WorksWithNumericValues([variable var]) CHECK(found) CHECK_EQUAL_VAR(value, 30) + WAVE hashmap_numeric_version0 = DeepCopyWaveRefWave(hashmap) + Duplicate/O hashmap_numeric_version0, root:hashmap_numeric_version0 + ret = HM_DeleteEntry(hashmap, key) CHECK_EQUAL_VAR(ret, 0) and moved into root:Hashmap.
This is more self explaining.
There was a problem hiding this comment.
Pull request overview
This PR updates MIES’ internal hashmap representation to reduce the number of per-bucket waves (a key factor in slow experiment loading with very large hashmaps) and introduces wave-layout versioning plus an explicit upgrade function to migrate existing stored hashmaps.
Changes:
- Switch hashmap storage to a single keys wave and a single values wave (2D, buckets as rows and entries as columns).
- Add wave layout versioning and an
HM_Upgrade()migration path from the old per-bucket-wave layout. - Update and extend hashmap unit tests to reflect the new internal layout and exercise upgrades.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| Packages/tests/Basic/UTF_Utils_Mies_Algorithm.ipf | Adds additional INFO output in an algorithm test. |
| Packages/tests/Basic/UTF_Hashmap.ipf | Updates hashmap tests for the new internal structure and adds upgrade tests. |
| Packages/MIES/MIES_WaveDataFolderGetters.ipf | Exposes SetWaveVersion and adds placeholders for upgrading persisted cache/logbook hashmaps. |
| Packages/MIES/MIES_Hashmap.ipf | Implements the new internal hashmap layout, adds HM_Upgrade, and adjusts core operations accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7afc66e to
d9e65e2
Compare
d9e65e2 to
0512d66
Compare
0512d66 to
28e2231
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
Packages/MIES/MIES_WaveDataFolderGetters.ipf:9037
- After upgrading the stored logbook key hashmap, MoveWaveWithOverwrite returns a new wave reference for the destination path. The current code ignores that return value and then returns the pre-upgrade
hashmapreference, which may be invalid because the destination wave was killed and recreated inside MoveWaveWithOverwrite. Assign the return value (or re-resolve the wave by name) and return the updated reference.
if(ExistsWithCorrectLayoutVersion(hashmap, HM_HASHMAP_WAVE_VERSION))
return hashmap
elseif(WaveExists(hashmap))
WAVE hashmap_new = HM_Upgrade(hashmap)
MoveWaveWithOverwrite(hashmap, hashmap_new)
return hashmap
28e2231 to
d198b28
Compare
d198b28 to
b66f9c0
Compare
b66f9c0 to
25427bb
Compare
Production data revealed that caches with ~130k entries result in pxp files which take a large amount of time to save and load. Digging revealed that this is due the sheer number of free waves we are using for a hashmap. We have 2 free waves per bucket, so for 130k entries we have 260k free waves. In addition the overhead for small waves (typically with single digit sizes) is quite large: •make/n=2/t txt •print waveinfo(txt, 0) NUMTYPE:0;DUNITS:;XUNITS:;MODIFIED:1;FULLSCALE:0,0,0;MODTIME:3860315657;PATH:;LOCK:0;HGETSTATE:0;SIZEINBYTES:664; •make/n=0/t/O txt •print waveinfo(txt, 0) NUMTYPE:0;DUNITS:;XUNITS:;MODIFIED:1;FULLSCALE:0,0,0;MODTIME:3860315683;PATH:;LOCK:0;HGETSTATE:0;SIZEINBYTES:648; •make/n=(2, 150e3)/t/O txt •print waveinfo(txt, 0) NUMTYPE:0;DUNITS:;XUNITS:;MODIFIED:1;FULLSCALE:0,0,0;MODTIME:3860315727;PATH:;LOCK:0;HGETSTATE:0;SIZEINBYTES:2400648; We therefore change the internal hashmap layout to consist of one keys wave and one values wave. Both are 2D and the rows are indexed by bucket index and the columns by key/value pairs. As there is no gain in using FindValue on a 2D wave when searching the columns with constant row index for a small number of entries we now always use the fast path in HM_GetKeyIndex.
25427bb to
8606fbc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Close #2695