-
Notifications
You must be signed in to change notification settings - Fork 101
CMR-11024: Fix reshard logic to have granules save to both old and new index during reshard #2370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mmented out, fix rebalance logic
…delete gran test.
📝 WalkthroughWalkthroughAdds index-set test fixtures and CRUD tests, updates index-set pruning and reshard-finalization to merge/generated concept mappings and record reshard targets, and adds system test utilities and elastic helpers plus new/updated resharding system tests. Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant IndexerSvc as Indexer Service
participant IndexSetSvc as Index-Set Service
participant ES as Elasticsearch
TestRunner->>IndexerSvc: Start reshard for Index A (create target B)
IndexerSvc->>ES: Create index B and update aliases (pointing to both A and B)
TestRunner->>IndexerSvc: Upsert granule G during reshard
IndexerSvc->>ES: Index granule G into A and B (while resharding)
IndexerSvc->>IndexSetSvc: Request reshard finalize
IndexSetSvc->>ES: Verify alias/index states and reindexing
IndexSetSvc->>IndexSetSvc: Merge generated/requested concept mappings (finalized-concepts)
IndexSetSvc->>IndexerSvc: Update index-set concepts mapping canonical->reshard-target
IndexSetSvc->>TestRunner: Return finalized index-set
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @system-int-test/src/cmr/system_int_test/utils/elastic_util.clj:
- Around line 20-28: The get-doc function's docstring is placed after the
argument vector, so move the string to immediately follow the function name:
place the docstring between get-doc and [index-name doc-id elastic-name], keep
the existing implementation (client/get call, json/decode, and let body)
unchanged; verify the docstring text "Get a specific doc (granule) from a
specified elastic cluster" is used in the new position so (doc get-doc) and
other tooling pick it up.
In @system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj:
- Around line 176-184: The second check is querying the same index; change the
second es-util/get-doc call to use the resharded index name instead of
small-collections-index-name (e.g., use the resharded index variable if one
exists, or construct it as (str small-collections-index-name "_4_shards")), so
that gran-doc-in-new-index reads from the new/resharded index (keep
gran-elastic-name and (:concept-id gran1) the same) and then assert the
revision-id as before.
- Around line 191-199: The update verification erroneously queries the same
index twice: both es-util/get-doc calls use small-collections-index-name; change
the second call that builds gran-doc-in-new-index to query the reindexed/target
index variable (the new index name used elsewhere in the test) instead of
small-collections-index-name so gran-doc-in-orig-index and gran-doc-in-new-index
actually read from the old and new indexes respectively (update the
es-util/get-doc invocation that creates gran-doc-in-new-index to use the test's
new-index variable).
- Around line 151-154: The test reshard-index-with-granule-updates-test is
missing the s/only-with-real-database wrapper and may run without a real
DB/Elasticsearch; locate the deftest named
reshard-index-with-granule-updates-test and wrap its test body (the testing
"Creating and updating granule during reshard saves to both old and target
indexes" form and its contents) with (s/only-with-real-database ...) just like
reshard-index-error-test and reshard-index-success-test so the test only runs
when a real database is available.
🧹 Nitpick comments (6)
system-int-test/src/cmr/system_int_test/utils/url_helper.clj (1)
663-666: Docstring should be placed before the parameter list.In Clojure, the docstring conventionally comes before the argument vector for it to be properly associated with the function metadata.
♻️ Suggested fix
(defn indexer-get-index-sets-by-id-url + "Gets the index set by id from indexer" [id] - "Gets the index set by id from indexer" (format "http://localhost:%s/index-sets/%d" (transmit-config/indexer-port) id))system-int-test/src/cmr/system_int_test/utils/index_util.clj (1)
192-214: Docstring placement and minor request configuration issues.
- The docstring should be placed before the parameter list for proper Clojure metadata association.
- The
content-typeheader is redundant for GET requests since no body is being sent.♻️ Suggested fix
(defn get-index-set-by-id + "Gets index set by id in clojure map form. + Example of returned map: + {:index-set { + :granule { + :indexes [...] + } + :concepts { + :granule { + :small-collections '1_small_collections' + :C1234-PROV1 '1_c1234_prov1' + } + } + }" [id] - "Gets index set by id in clojure map form. - Example of returned map: - {:index-set { - :granule { - :indexes [...] - } - :concepts { - :granule { - :small-collections '1_small_collections' - :C1234-PROV1 '1_c1234_prov1' - } - } - }" (let [resp (client/get (url/indexer-get-index-sets-by-id-url id) {:query-params {:format "json"} - :headers {transmit-config/token-header (transmit-config/echo-system-token) - "content-type" "application/json"} + :headers {transmit-config/token-header (transmit-config/echo-system-token)} :connection-manager (s/conn-mgr) :throw-exceptions false}) index-sets (json/parse-string (:body resp) true)] index-sets))system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (2)
8-8: Unused import:erroris imported but not used.Only
infois used in the new functions. Removeerrorfrom the referent list.♻️ Suggested fix
- [cmr.common.log :as log :refer [error info]] + [cmr.common.log :as log :refer [info]]
540-557: Consider failing fast on "FAILED" status.The function will continue polling until timeout even if the reshard status becomes "FAILED". Consider adding an early exit condition for failure states to provide faster feedback in tests.
♻️ Suggested improvement
(defn wait-for-reshard-complete [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}] (loop [attempt 1] (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name}) status (get-in res [:reshard-status])] (cond (= status "COMPLETE") (info "Success: Resharding is COMPLETE.") + (= status "FAILED") + (throw (Exception. (str "Resharding failed with status: FAILED"))) + (>= attempt max-attempts) (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts."))) :else (do (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms)) (Thread/sleep sleep-ms) (recur (inc attempt)))))))system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (1)
4-4: Unused import:clojure.stringis only used in commented-out code.The
clojure.stringimport is no longer needed since the test that used it is now commented out.Proposed fix
(:require - [clojure.string :as str] [clojure.test :refer [deftest is testing use-fixtures]]indexer-app/src/cmr/indexer/services/index_set_service.clj (1)
133-134: Consider simplifying the condition.The double negation with
notcan be replaced with more idiomatic Clojure.Proposed simplification
- finalized-concepts (if (and (not (= requested-updated-granule-concepts generated-updated-granule-concepts)) - (not (nil? generated-updated-granule-concepts))) + finalized-concepts (if (and (not= requested-updated-granule-concepts generated-updated-granule-concepts) + (some? generated-updated-granule-concepts))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
indexer-app/int-test/cmr/indexer/test/utility.cljindexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.cljindexer-app/src/cmr/indexer/data/index_set.cljindexer-app/src/cmr/indexer/services/index_set_service.cljsystem-int-test/src/cmr/system_int_test/utils/bootstrap_util.cljsystem-int-test/src/cmr/system_int_test/utils/elastic_util.cljsystem-int-test/src/cmr/system_int_test/utils/index_util.cljsystem-int-test/src/cmr/system_int_test/utils/url_helper.cljsystem-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.cljsystem-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.cljindexer-app/int-test/cmr/indexer/test/utility.cljindexer-app/src/cmr/indexer/data/index_set.cljindexer-app/src/cmr/indexer/services/index_set_service.cljindexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.cljsystem-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.cljsystem-int-test/src/cmr/system_int_test/utils/index_util.clj
🔇 Additional comments (13)
indexer-app/src/cmr/indexer/data/index_set.clj (1)
1179-1186: LGTM! Core reshard dual-write logic looks correct.This change correctly implements the PR objective: when a collection has a single granule index that is being resharded, both the original and target indexes are returned. This ensures granule updates are written to both indexes during the reshard process, enabling safe rollback if the reshard fails.
The conditional structure is clear:
- Single index → check for resharding → if resharding, include target
- Multiple indexes (rebalancing) → return as-is (already handled by rebalancing logic)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (1)
559-575: LGTM with same consideration for FAILED status.The implementation is consistent with
wait-for-reshard-complete. The same suggestion about failing fast on "FAILED" status applies here.indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj (2)
113-133: LGTM! Comprehensive test coverage for index-set update operations.The updated test structure properly validates:
- Original index-set creation and state verification
- Index-set update via PUT and state verification
- PUT-based creation (upsert semantics)
Good use of expected value fixtures (
util/expected-orig-index-set,util/expected-sample-index-set-after-update) for assertions.
134-150: Good edge case coverage for empty/minimal inputs.These tests verify that both POST and PUT operations handle minimal index-set definitions gracefully, defaulting to a known empty state. This is important for defensive programming and ensures the system behaves predictably with incomplete inputs.
indexer-app/int-test/cmr/indexer/test/utility.clj (3)
90-141: LGTM - Well-structured test fixture for original index set.The
expected-orig-index-setprovides clear documentation of expected state and covers all concept types including generics. The structure matches the index-set format used elsewhere in the codebase.
219-315: LGTM - Comprehensive expected state after update.The
expected-sample-index-set-after-updatecorrectly captures the expected granule concepts including the newC6-PROV3andC7-PROV4indexes, and properly shows the updated collection mapping toCOLL2-PROV1.
371-399: LGTM - Empty index-set fixture for edge case testing.Good addition for testing the update-index-sets behavior with minimal/empty input. The
:create-reason nilis intentional for this test case.system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj (1)
17-17: LGTM - Properly migrated to new elastic-util helper.The refactoring correctly uses
es-util/doc-present?with the updated signature that removes the type name parameter (appropriate for Elasticsearch 7.x+ where mapping types were deprecated). The argument order(index-alias, concept-id, elastic-name)aligns with the new implementation inelastic_util.clj.Also applies to: 29-31
system-int-test/src/cmr/system_int_test/utils/elastic_util.clj (2)
9-18: LGTM - Robust document presence check.The implementation correctly uses the search API with ID query and validates both the hit count and the returned document ID. Using
:throw-exceptions falseis appropriate for this check operation.
30-37: LGTM - Simple and effective index existence check.Checking HTTP status 200 is the correct approach for verifying index existence in Elasticsearch.
indexer-app/src/cmr/indexer/services/index_set_service.clj (3)
123-143: LGTM - Core fix for reshard dual-write behavior.This logic correctly:
- Generates the default concept mappings based on index configurations
- Compares generated granule concepts with any explicitly requested granule concepts
- Merges them when they differ, preserving requested values (which point to the old index during reshard)
This ensures the index-set's concept mapping remains pointed to the old index until finalization, enabling dual-writes to both old and new indexes during reshard.
776-777: LGTM - Critical fix for reshard finalization.This addition updates the
:conceptsmap to associate the canonical index name with the new resharded target index. This ensures that after finalization, the index-set correctly points to the new index for future writes.
275-275: LGTM - Comment accurately reflects the rollback scope.The updated comment correctly indicates that rollback now covers alias creation failures as well as index creation.
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
Show resolved
Hide resolved
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2370 +/- ##
=======================================
Coverage 58.01% 58.02%
=======================================
Files 1063 1064 +1
Lines 72765 72827 +62
Branches 2113 2112 -1
=======================================
+ Hits 42215 42256 +41
- Misses 28580 28601 +21
Partials 1970 1970 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
Outdated
Show resolved
Hide resolved
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
Outdated
Show resolved
Hide resolved
eereiter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the found issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj:
- Line 8: Remove the unused import by deleting `error` from the refer list in
the cmr.common.log import if you don't intend to log errors, or alternatively
invoke `error` inside the timeout branches of the new wait functions (e.g., the
wait-for-* functions added) before throwing an exception so timeouts are logged;
update the import to `[cmr.common.log :as log :refer [info]]` if removing, or
add a call like `(error (str "Timeout: Resharding failed to complete after "
attempt " attempts."))` in the timeout path(s) and keep the refer list as
`[error info]`.
In @system-int-test/src/cmr/system_int_test/utils/elastic_util.clj:
- Around line 9-18: In doc-present? fix the Elasticsearch path and encode the
doc id: change the client/get call’s path to use the search endpoint without the
deprecated _doc segment (i.e., use /{index}/_search) and URL-encode the doc-id
before interpolating into the q parameter (e.g., use
java.net.URLEncoder.encode(doc-id, "UTF-8") or a URI builder) so special
characters are safely escaped; update the client/get invocation in the
doc-present? function to use the corrected path and encoded id while preserving
the existing connection-manager and error handling.
In @system-int-test/src/cmr/system_int_test/utils/url_helper.clj:
- Around line 663-666: The function indexer-get-index-sets-by-id-url uses a %d
formatter which differs from the transmit-lib pattern; update the format string
to use %s and pass (str id) instead of id so it matches
rebalance-collection-url, reshard-index-url, and get-index-set patterns in
transmit-lib; modify the body of indexer-get-index-sets-by-id-url to call
(format "http://localhost:%s/index-sets/%s" (transmit-config/indexer-port) (str
id)).
In @system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj:
- Around line 151-154: Wrap the reshard-index-with-granule-updates-test with
s/only-with-real-database like the other tests (reshard-index-error-test and
reshard-index-success-test) because it calls es-util/get-doc,
es-util/index-exists?, and index/get-index-set-by-id; add the opening
s/only-with-real-database wrapper around the test body and ensure you add the
missing closing parenthesis at the end of the test (after the current end of the
test) so the form is properly balanced.
🧹 Nitpick comments (4)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (2)
540-557: Consider returning the final status for test assertions.The implementation correctly polls for reshard completion. However, returning
nilon success means callers cannot assert on the final status. Consider returning the result map on success.♻️ Optional: Return status on success
(cond (= status "COMPLETE") - (info "Success: Resharding is COMPLETE.") + (do + (info "Success: Resharding is COMPLETE.") + res) (>= attempt max-attempts)
559-575: LGTM with optional DRY suggestion.The implementation correctly polls for rebalance completion. Both wait functions share nearly identical polling logic and could optionally be refactored to a common helper to reduce duplication.
♻️ Optional: Extract common polling logic
(defn- poll-until-complete "Generic polling helper that waits for a status function to return COMPLETE." [status-fn status-key operation-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}] (loop [attempt 1] (let [res (status-fn) status (get res status-key)] (cond (= status "COMPLETE") (do (info (str "Success: " operation-name " is COMPLETE.")) res) (>= attempt max-attempts) (throw (Exception. (str "Timeout: " operation-name " failed to complete after " attempt " attempts."))) :else (do (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms)) (Thread/sleep sleep-ms) (recur (inc attempt))))))) (defn wait-for-reshard-complete [coll-index-name elastic-name opts] (poll-until-complete #(get-reshard-status coll-index-name {:elastic-name elastic-name}) :reshard-status "Resharding" opts)) (defn wait-for-rebalance-to-complete [collection opts] (poll-until-complete #(get-rebalance-status (:concept-id collection)) :rebalancing-status "Rebalance" opts))system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (2)
4-4: Unused import:clojure.stringThis import is only referenced in commented-out code (line 236). Consider removing it to avoid dead code warnings.
Suggested fix
(:require - [clojure.string :as string] [clojure.test :refer [deftest is testing use-fixtures]]
218-223: Consider tracking disabled test differentlyThe explanation for why this test is commented out is helpful, but ~95 lines of commented code adds maintenance burden. Consider either:
- Converting to a
:pendingor:skiptest annotation if your test framework supports it- Removing the commented code and tracking it in a GitHub issue with the explanation
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.cljsystem-int-test/src/cmr/system_int_test/utils/elastic_util.cljsystem-int-test/src/cmr/system_int_test/utils/url_helper.cljsystem-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.cljsystem-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.cljsystem-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
🔇 Additional comments (5)
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (1)
155-216: LGTM - Test logic is comprehensiveThe test correctly validates the PR objective: during reshard, granule creates/updates are persisted to both old and new indexes, the index-set mapping remains unchanged until finalization, and cleanup occurs properly after finalization.
system-int-test/src/cmr/system_int_test/utils/elastic_util.clj (2)
20-28: LGTM - Consider documenting the not-found behavior.The document retrieval path
/{index}/_doc/{id}is correct for ES 7.x+. The function will return the full response body includingfound: falsewhen the document doesn't exist, which callers should handle. This is appropriate for test utilities.
30-37: LGTM!The implementation correctly checks index existence by verifying a 200 status. Using GET is acceptable for integration tests; a HEAD request would be slightly more efficient but is not necessary here.
system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj (2)
16-16: LGTM!The new namespace require follows the existing conventions in this file.
24-30: LGTM!The migration to
es-util/doc-present?is clean. The removal of the type-name argument aligns correctly with the ES 7.x deprecation of mapping types, and the argument order matches the new utility signature.
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
Outdated
Show resolved
Hide resolved
| (defn doc-present? | ||
| "If doc is present return true, otherwise return false" | ||
| [index-name doc-id elastic-name] | ||
| (let [response (client/get | ||
| (format "%s/%s/_doc/_search?q=_id:%s" (url/elastic-root elastic-name) index-name doc-id) | ||
| {:throw-exceptions false | ||
| :connection-manager (s/conn-mgr)}) | ||
| body (json/decode (:body response) true)] | ||
| (and (= 1 (get-in body [:hits :total :value])) | ||
| (= doc-id (get-in body [:hits :hits 0 :_id]))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for existing Elasticsearch search patterns in the codebase
rg -n --type clojure '_search' -C 2Repository: nasa/Common-Metadata-Repository
Length of output: 7237
🌐 Web query:
Elasticsearch 7.x search API endpoint path format
💡 Result:
Elasticsearch 7.x Search API endpoint paths:
- GET or POST /_search — cluster-wide search. (elastic.co) [1]
- GET or POST /{index}/_search — search a specific index ( {index} may be one name, a comma-separated list, wildcard, or alias ). (elastic.co) [2]
Citations:
- 1: https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search?utm_source=openai
- 2: https://www.elastic.co/guide/en/elasticsearch/reference/8.18/search-search.html?utm_source=openai
Fix Elasticsearch search API path and add URL encoding for doc-id.
Two issues:
-
The path
/{index}/_doc/_searchis incorrect for Elasticsearch 7.x. The search endpoint is/{index}/_searchwithout the_docsegment. The_doctype was deprecated and is not part of the search API path. -
The
doc-idshould be URL-encoded in the query string to handle special characters safely. UseURLEncoder.encode()or a URI-building utility rather than direct string interpolation into the query parameter.
🤖 Prompt for AI Agents
In @system-int-test/src/cmr/system_int_test/utils/elastic_util.clj around lines
9 - 18, In doc-present? fix the Elasticsearch path and encode the doc id: change
the client/get call’s path to use the search endpoint without the deprecated
_doc segment (i.e., use /{index}/_search) and URL-encode the doc-id before
interpolating into the q parameter (e.g., use java.net.URLEncoder.encode(doc-id,
"UTF-8") or a URI builder) so special characters are safely escaped; update the
client/get invocation in the doc-present? function to use the corrected path and
encoded id while preserving the existing connection-manager and error handling.
| (deftest reshard-index-with-granule-updates-test | ||
| ;; When a granule is upserted during the reshard process, it should save to both indexes and the index set | ||
| ;; should not update the index mapping until the reshard is finalized | ||
| (testing "Creating and updating granule during reshard saves to both old and target indexes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing s/only-with-real-database wrapper
This test interacts directly with Elasticsearch via es-util/get-doc, es-util/index-exists?, and index/get-index-set-by-id, but unlike the other tests in this file (reshard-index-error-test at line 21 and reshard-index-success-test at line 98), it's missing the s/only-with-real-database wrapper.
Suggested fix
(deftest reshard-index-with-granule-updates-test
+ (s/only-with-real-database
;; When a granule is upserted during the reshard process, it should save to both indexes and the index set
;; should not update the index mapping until the reshard is finalized
(testing "Creating and updating granule during reshard saves to both old and target indexes"
(let [gran-elastic-name "gran-elastic"And add the closing parenthesis at the end of the test (after line 216):
(is (not (es-util/index-exists? small-collections-index-name gran-elastic-name))))))
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (deftest reshard-index-with-granule-updates-test | |
| ;; When a granule is upserted during the reshard process, it should save to both indexes and the index set | |
| ;; should not update the index mapping until the reshard is finalized | |
| (testing "Creating and updating granule during reshard saves to both old and target indexes" | |
| (deftest reshard-index-with-granule-updates-test | |
| (s/only-with-real-database | |
| ;; When a granule is upserted during the reshard process, it should save to both indexes and the index set | |
| ;; should not update the index mapping until the reshard is finalized | |
| (testing "Creating and updating granule during reshard saves to both old and target indexes" | |
| (let [gran-elastic-name "gran-elastic" | |
| ;; ... rest of test content from lines 155-215 remains unchanged ... | |
| ] | |
| ;; ... test assertions ... | |
| (is (not (es-util/index-exists? small-collections-index-name gran-elastic-name))))))) | |
| ) |
🤖 Prompt for AI Agents
In @system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
around lines 151 - 154, Wrap the reshard-index-with-granule-updates-test with
s/only-with-real-database like the other tests (reshard-index-error-test and
reshard-index-success-test) because it calls es-util/get-doc,
es-util/index-exists?, and index/get-index-set-by-id; add the opening
s/only-with-real-database wrapper around the test body and ensure you add the
missing closing parenthesis at the end of the test (after the current end of the
test) so the form is properly balanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj`:
- Around line 540-557: The wait-for-reshard-complete function lacks a docstring
and does not validate the HTTP response status from get-reshard-status before
reading :reshard-status, causing retries to mask server errors; add a docstring
to wait-for-reshard-complete, check that the response status (e.g., res's
:status or :http-status) equals 200 before inspecting :reshard-status, and if it
is non-200 log an error with the response details and throw (or return)
immediately instead of continuing retries; keep existing retry behavior only for
valid 200 responses and preserve the max-attempts/sleep-ms parameters.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.cljsystem-int-test/src/cmr/system_int_test/utils/url_helper.clj
🚧 Files skipped from review as they are similar to previous changes (1)
- system-int-test/src/cmr/system_int_test/utils/url_helper.clj
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
🔇 Additional comments (1)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (1)
8-8: LGTM!The import is correctly updated to only include
info, which is used by both new wait functions.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| (defn wait-for-reshard-complete | ||
| [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}] | ||
| (loop [attempt 1] | ||
| (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name}) | ||
| status (get-in res [:reshard-status])] | ||
| (cond | ||
| (= status "COMPLETE") | ||
| (info "Success: Resharding is COMPLETE.") | ||
|
|
||
| (>= attempt max-attempts) | ||
| (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts."))) | ||
|
|
||
| :else | ||
| (do | ||
| (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." | ||
| attempt max-attempts status sleep-ms)) | ||
| (Thread/sleep sleep-ms) | ||
| (recur (inc attempt))))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add HTTP status validation and docstring.
The function doesn't validate that the HTTP response status is 200 before checking :reshard-status. If the endpoint returns an error (e.g., 500), the loop will retry until timeout, masking the actual issue. Also, a docstring would align with other functions in this file.
Suggested fix
+(defn wait-for-reshard-complete
+ "Polls reshard status until COMPLETE or timeout. Throws on timeout."
+ [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
+ (loop [attempt 1]
+ (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name})
+ http-status (:status res)
+ status (get res :reshard-status)]
+ (cond
+ (not= 200 http-status)
+ (throw (Exception. (str "Reshard status check failed with HTTP status: " http-status)))
+
+ (= status "COMPLETE")
+ (info "Success: Resharding is COMPLETE.")
+
+ (>= attempt max-attempts)
+ (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts.")))
+
+ :else
+ (do
+ (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..."
+ attempt max-attempts status sleep-ms))
+ (Thread/sleep sleep-ms)
+ (recur (inc attempt)))))))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (defn wait-for-reshard-complete | |
| [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}] | |
| (loop [attempt 1] | |
| (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name}) | |
| status (get-in res [:reshard-status])] | |
| (cond | |
| (= status "COMPLETE") | |
| (info "Success: Resharding is COMPLETE.") | |
| (>= attempt max-attempts) | |
| (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts."))) | |
| :else | |
| (do | |
| (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." | |
| attempt max-attempts status sleep-ms)) | |
| (Thread/sleep sleep-ms) | |
| (recur (inc attempt))))))) | |
| (defn wait-for-reshard-complete | |
| "Polls reshard status until COMPLETE or timeout. Throws on timeout." | |
| [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}] | |
| (loop [attempt 1] | |
| (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name}) | |
| http-status (:status res) | |
| status (get res :reshard-status)] | |
| (cond | |
| (not= 200 http-status) | |
| (throw (Exception. (str "Reshard status check failed with HTTP status: " http-status))) | |
| (= status "COMPLETE") | |
| (info "Success: Resharding is COMPLETE.") | |
| (>= attempt max-attempts) | |
| (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts."))) | |
| :else | |
| (do | |
| (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." | |
| attempt max-attempts status sleep-ms)) | |
| (Thread/sleep sleep-ms) | |
| (recur (inc attempt))))))) |
🤖 Prompt for AI Agents
In `@system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj` around
lines 540 - 557, The wait-for-reshard-complete function lacks a docstring and
does not validate the HTTP response status from get-reshard-status before
reading :reshard-status, causing retries to mask server errors; add a docstring
to wait-for-reshard-complete, check that the response status (e.g., res's
:status or :http-status) equals 200 before inspecting :reshard-status, and if it
is non-200 log an error with the response details and throw (or return)
immediately instead of continuing retries; keep existing retry behavior only for
valid 200 responses and preserve the max-attempts/sleep-ms parameters.
| (defn wait-for-rebalance-to-complete | ||
| [collection {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}] | ||
| (loop [attempt 1] | ||
| (let [res (get-rebalance-status (:concept-id collection)) | ||
| status (get-in res [:rebalancing-status])] | ||
| (cond | ||
| (= status "COMPLETE") | ||
| (info "Success: Rebalance is COMPLETE.") | ||
|
|
||
| (>= attempt max-attempts) | ||
| (throw (Exception. (str "Timeout: Rebalance failed to complete after " attempt " attempts."))) | ||
|
|
||
| :else | ||
| (do | ||
| (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms)) | ||
| (Thread/sleep sleep-ms) | ||
| (recur (inc attempt))))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add HTTP status validation, docstring, and trailing newline.
Same concerns as the previous function: no HTTP status validation before checking :rebalancing-status, and missing docstring. Additionally, the file is missing a trailing newline.
Suggested fix
+(defn wait-for-rebalance-to-complete
+ "Polls rebalance status until COMPLETE or timeout. Throws on timeout."
+ [collection {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
+ (loop [attempt 1]
+ (let [res (get-rebalance-status (:concept-id collection))
+ http-status (:status res)
+ status (get res :rebalancing-status)]
+ (cond
+ (not= 200 http-status)
+ (throw (Exception. (str "Rebalance status check failed with HTTP status: " http-status)))
+
+ (= status "COMPLETE")
+ (info "Success: Rebalance is COMPLETE.")
+
+ (>= attempt max-attempts)
+ (throw (Exception. (str "Timeout: Rebalance failed to complete after " attempt " attempts.")))
+
+ :else
+ (do
+ (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms))
+ (Thread/sleep sleep-ms)
+ (recur (inc attempt)))))))
+Note: The trailing + line represents the missing newline at end of file.
Overview
What is the objective?
During a reshard, if a user were to create/update a granule concept, it was only saved to the new index which is not expected behavior because it could cause issues if the reshard failed and we have to roll back. Instead, the granule concept should be saving to both indexes and the index-set should remain mapped to the old index until the reshard is finalized.
What are the changes?
Update index update to keep index set pointed to the old index which, in turn, update both the old and new indexes with the granule changes in elasticsearch
Update sys int tests to capture errors related to index set in the future and make them more encompassing
What areas of the application does this impact?
List impacted applications
Required Checklist
Additional Checklist
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.