Fix Jindo DataLoad without target for multi mount Datasets#5631
Fix Jindo DataLoad without target for multi mount Datasets#5631Nakshatra480 wants to merge 2 commits intofluid-cloudnative:masterfrom
Conversation
Summary of ChangesHello @Nakshatra480, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the JindoRuntime's data loading capabilities by introducing a more robust handling mechanism for datasets with multiple mount points when no explicit target is specified in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly fixes an issue where a Jindo DataLoad would fail if no target paths were specified for a dataset with multiple mounts. The logic now correctly falls back to using the dataset's mount points, with a default replica count of 1. The implementation is clear and the addition of a unit test for the multi-mount scenario is a great improvement. I've added one suggestion to further enhance test coverage by including an edge case.
Signed-off-by: Nakshatra Sharma <nakshatra.sharma3012@gmail.com>
6f7a6bf to
406ea8d
Compare
Signed-off-by: Nakshatra Sharma <nakshatrasharma@Nakshatras-MacBook-Air.local>
|
|
@cheyang pls review these changes and give me the feedback if anything comes up thanks |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5631 +/- ##
=======================================
Coverage 59.10% 59.10%
=======================================
Files 444 444
Lines 30538 30538
=======================================
Hits 18049 18049
Misses 10992 10992
Partials 1497 1497 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in JindoRuntime where DataLoad operations would fail when no explicit spec.target was provided for datasets with multiple mount points. The fix adds fallback logic to derive target paths from the dataset's mounts when no explicit targets are specified.
Changes:
- Added conditional logic to fallback to dataset mounts when
DataLoad.spec.targetis empty - Added unit test coverage for datasets with multiple mounts and no explicit target
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/ddc/jindo/load_data.go | Implements fallback logic to use dataset mounts as target paths when no explicit target is specified in DataLoad |
| pkg/ddc/jindo/load_data_test.go | Adds two new test cases covering datasets without mounts and datasets with multiple mounts when no explicit target is provided |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if m.Path == "" { | ||
| continue | ||
| } | ||
| fluidNative := utils.IsTargetPathUnderFluidNativeMounts(m.Path, *targetDataset) | ||
| targetPaths = append(targetPaths, cdataload.TargetPath{ | ||
| Path: m.Path, |
There was a problem hiding this comment.
When mount.Path is empty, the code skips that mount entirely. According to the UFSPathBuilder.GenUFSPathInUnifiedNamespace convention, if mount.Path is not set, the path should default to /{mount.Name}. Consider using GenUFSPathInUnifiedNamespace to get the correct path instead of skipping mounts with empty paths, to ensure consistency with how mount paths are handled elsewhere in the codebase.
| if m.Path == "" { | |
| continue | |
| } | |
| fluidNative := utils.IsTargetPathUnderFluidNativeMounts(m.Path, *targetDataset) | |
| targetPaths = append(targetPaths, cdataload.TargetPath{ | |
| Path: m.Path, | |
| mountPath := m.Path | |
| if mountPath == "" { | |
| mountPath = "/" + m.Name | |
| } | |
| fluidNative := utils.IsTargetPathUnderFluidNativeMounts(mountPath, *targetDataset) | |
| targetPaths = append(targetPaths, cdataload.TargetPath{ | |
| Path: mountPath, |
| } else { | ||
| for _, m := range targetDataset.Spec.Mounts { | ||
| if m.Path == "" { | ||
| continue | ||
| } | ||
| fluidNative := utils.IsTargetPathUnderFluidNativeMounts(m.Path, *targetDataset) | ||
| targetPaths = append(targetPaths, cdataload.TargetPath{ | ||
| Path: m.Path, | ||
| Replicas: 1, | ||
| FluidNative: fluidNative, | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
The same issue likely exists in JindoCache and JindoFsx runtimes. Both pkg/ddc/jindocache/load_data.go and pkg/ddc/jindofsx/load_data.go iterate directly over dataload.Spec.Target without handling the empty target case. Consider applying this same fix to those runtimes for consistency and to prevent the same bug from affecting users of those runtime types.



Closes #4439
Summary
This PR fixes Jindo
DataLoadbehavior whenspec.targetis not set but the targetDatasethas multiple mounts.DataLoad.spec.targetis specified, behavior is unchanged.DataLoad.spec.targetis empty, the controller now derivestargetPathsfromDataset.spec.mounts(one entry per mount path, replicas default to 1).genDataLoadValue.This addresses the issue where a JindoRuntime
DataLoadcould fail when notargetwas provided for a dataset with multiple mount points (see issue #4439).Implementation details
Updated
JindoEngine.genDataLoadValueto:spec.target.targetDataset.Spec.Mountswhenspec.targetis empty, buildingcdataload.TargetPathentries with:Pathfrom the mountPathReplicasset to1FluidNativedecided viautils.IsTargetPathUnderFluidNativeMounts.Extended
Test_genDataLoadValueto include:"dataset with multiple mounts and no explicit target":/mnt0,/mnt1) are translated intoTargetPathswith correctReplicasandFluidNativevalues.loadMemorydataandhdfsConfigare still set as before.Why this change
For JindoRuntime, users reasonably expect a
DataLoadwithoutspec.targetto preload all dataset mounts. Previously, this scenario was not handled explicitly, leading to failures when the dataset had multiple mount points. By falling back to dataset mounts, we align the controller behavior with user expectations and the other runtimes’ patterns.Testing
go test ./pkg/ddc/jindo -run Test_genDataLoadValue$Test_genDataLoadValuecases continue to pass.