[CELEBORN-2268] Improve test coverage for MEMORY and S3 storage#3608
[CELEBORN-2268] Improve test coverage for MEMORY and S3 storage#3608eolivelli wants to merge 10 commits intoapache:mainfrom
Conversation
This commit adds only tests and some useful debug information about using MEMORY and S3 storage.
| .mapValues(locations => locations.size) | ||
| var offerSlotsMsg = s"Successfully offered slots for $numReducers reducers of $shuffleKey" + | ||
| s" on ${slots.size()} workers" | ||
| s" on ${slots.size()} workers, primary types: $primaryLocationsByType" |
There was a problem hiding this comment.
when you have different spark jobs that select different storageTypes this log line is very useful, to quickly check that the client is doing what it is expected to do
| } else { | ||
| } else if (StorageInfo.HDFSAvailable(availableStorageTypes)) { | ||
| storageInfo = new StorageInfo("", StorageInfo.Type.HDFS, availableStorageTypes); | ||
| } else if (StorageInfo.memoryAvailable(availableStorageTypes)) { |
There was a problem hiding this comment.
this change is not strictly needed by this patch
but in some error conditions (in case I force the test to fail), when using MEMORY storage, you see PartitionLocations assigned to HDFS, even if HDFS is not configured.
I can move this to another PR, but I think it's relevant to this patch.
There was a problem hiding this comment.
I added unit tests to cover this method.
this is indeed important because with the new unit tests we are sealing the behavior of the master choosing the storage types depending on the specs
| @@ -67,9 +68,9 @@ class BasicEndToEndTieredStorageTest extends AnyFunSuite | |||
|
|
|||
| val s3url = container.getS3URL | |||
| val augmentedConfiguration = Map( | |||
| CelebornConf.ACTIVE_STORAGE_TYPES.key -> "MEMORY,S3", | |||
There was a problem hiding this comment.
I changed this existing test that I contribute some time ago to test only "S3", MEMORY was not really meant to be tested, it is better to make the test simpler in order to prevent confusion to readers
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3608 +/- ##
==========================================
- Coverage 67.13% 66.86% -0.26%
==========================================
Files 357 357
Lines 21860 21967 +107
Branches 1943 1949 +6
==========================================
+ Hits 14674 14687 +13
- Misses 6166 6264 +98
+ Partials 1020 1016 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What changes were proposed in this pull request?
This commit adds only tests and some useful debug information about using MEMORY and S3 storage.
Why are the changes needed?
Because there is not enough code coverage on some configurations that may happen in production, in particular about:
There is an interesting case to test: when you configure MEMORY to S3 eviction and the dataset is small.
It is important to ensure that no file is created in S3
Does this PR resolve a correctness bug?
No
Does this PR introduce any user-facing change?
No
How was this patch tested?
It adds new integration tests