Skip to content

Conversation

@jmaeng72
Copy link
Contributor

@jmaeng72 jmaeng72 commented Jan 12, 2026

Overview

What is the objective?

During a reshard, if a user were to create/update a granule concept, it was only saved to the new index which is not expected behavior because it could cause issues if the reshard failed and we have to roll back. Instead, the granule concept should be saving to both indexes and the index-set should remain mapped to the old index until the reshard is finalized.

What are the changes?

Update index update to keep index set pointed to the old index which, in turn, update both the old and new indexes with the granule changes in elasticsearch

Update sys int tests to capture errors related to index set in the future and make them more encompassing

What areas of the application does this impact?

List impacted applications

  • Ingest

Required Checklist

  • New and existing unit and int tests pass locally and remotely
  • clj-kondo has been run locally and all errors in changed files are corrected
  • I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

  • New Features

    • Resharding: final state now records reshard targets and index/alias creation is included in rollback handling.
    • New test utilities to wait for reshard/rebalance completion and to query test Elasticsearch instances.
  • Tests

    • Expanded integration tests for granule upserts during resharding and index-set create/update flows, including empty-input scenarios.
    • Added richer index-set test fixtures and new validation tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@jmaeng72 jmaeng72 self-assigned this Jan 12, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

Adds index-set test fixtures and CRUD tests, updates index-set pruning and reshard-finalization to merge/generated concept mappings and record reshard targets, and adds system test utilities and elastic helpers plus new/updated resharding system tests.

Changes

Cohort / File(s) Summary
Test Data & Fixtures
indexer-app/int-test/cmr/indexer/test/utility.clj
Added expected-orig-index-set, expected-sample-index-set-after-update, expected-empty-index-set; expanded and reorganized index-set fixtures, renamed several indexes, added new indexes/mappings and comments.
CRUD Test Cases
indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj
Added tests update-index-sets-with-empty-inputs and create-index-set-with-empty-input to validate create/update flows with empty/minimal inputs.
Index Set Core Logic
indexer-app/src/cmr/indexer/data/index_set.clj
get-granule-index-names-for-collection now includes the reshard target index when a single candidate index is actively resharding.
Index Set Service Logic
indexer-app/src/cmr/indexer/services/index_set_service.clj
prune-index-set now generates per-index canonical concept mappings and merges generated granule concepts with requested ones (with logging); finalize-index-resharding records canonical->reshard-target mapping in index-set concepts; create-index-set log mentions aliases.
Bootstrap Polling Utilities
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
Added wait-for-reshard-complete and wait-for-rebalance-to-complete helpers that poll status endpoints, log progress, and throw on timeout.
Elasticsearch Query Utilities
system-int-test/src/cmr/system_int_test/utils/elastic_util.clj
New namespace with doc-present?, get-doc, and index-exists? for non-throwing ES queries and JSON decoding.
Index & URL Utilities
system-int-test/src/cmr/system_int_test/utils/index_util.clj, system-int-test/src/cmr/system_int_test/utils/url_helper.clj
Removed doc-present? from index_util; added get-index-set-by-id HTTP helper and indexer-get-index-sets-by-id-url; adjusted a URL to use ingest-port.
Resharding Integration Tests
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
Added reshard-index-with-granule-updates-test exercising upserts during reshard; note: identical test block appears duplicated.
Deleted Granules Tests
system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj
Switched to elastic-util/doc-present? (signature: index-alias, concept-id, elastic-name) replacing previous index-based doc check.

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner as Test Runner
  participant IndexerSvc as Indexer Service
  participant IndexSetSvc as Index-Set Service
  participant ES as Elasticsearch

  TestRunner->>IndexerSvc: Start reshard for Index A (create target B)
  IndexerSvc->>ES: Create index B and update aliases (pointing to both A and B)
  TestRunner->>IndexerSvc: Upsert granule G during reshard
  IndexerSvc->>ES: Index granule G into A and B (while resharding)
  IndexerSvc->>IndexSetSvc: Request reshard finalize
  IndexSetSvc->>ES: Verify alias/index states and reindexing
  IndexSetSvc->>IndexSetSvc: Merge generated/requested concept mappings (finalized-concepts)
  IndexSetSvc->>IndexerSvc: Update index-set concepts mapping canonical->reshard-target
  IndexSetSvc->>TestRunner: Return finalized index-set
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

hacktoberfest-accepted

Suggested reviewers

  • jceaser
  • daniel-zamora
  • eereiter
  • indiejames

Poem

🐇
I hop through mappings, tidy each row,
I watch reshard rivers calmly flow.
Tests blink green as indexes align,
I nibble a carrot and hum a small sign.
Hooray — new concepts fall into line!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: fixing reshard logic to ensure granules save to both old and new indexes during reshard, which is the primary focus of the changeset.
Description check ✅ Passed The PR description addresses the Overview section with clear objectives and changes, though some additional checklist items are not completed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.



📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f97cc4d and e8b3075.

📒 Files selected for processing (1)
  • indexer-app/src/cmr/indexer/data/index_set.clj
🚧 Files skipped from review as they are similar to previous changes (1)
  • indexer-app/src/cmr/indexer/data/index_set.clj

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @system-int-test/src/cmr/system_int_test/utils/elastic_util.clj:
- Around line 20-28: The get-doc function's docstring is placed after the
argument vector, so move the string to immediately follow the function name:
place the docstring between get-doc and [index-name doc-id elastic-name], keep
the existing implementation (client/get call, json/decode, and let body)
unchanged; verify the docstring text "Get a specific doc (granule) from a
specified elastic cluster" is used in the new position so (doc get-doc) and
other tooling pick it up.

In @system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj:
- Around line 176-184: The second check is querying the same index; change the
second es-util/get-doc call to use the resharded index name instead of
small-collections-index-name (e.g., use the resharded index variable if one
exists, or construct it as (str small-collections-index-name "_4_shards")), so
that gran-doc-in-new-index reads from the new/resharded index (keep
gran-elastic-name and (:concept-id gran1) the same) and then assert the
revision-id as before.
- Around line 191-199: The update verification erroneously queries the same
index twice: both es-util/get-doc calls use small-collections-index-name; change
the second call that builds gran-doc-in-new-index to query the reindexed/target
index variable (the new index name used elsewhere in the test) instead of
small-collections-index-name so gran-doc-in-orig-index and gran-doc-in-new-index
actually read from the old and new indexes respectively (update the
es-util/get-doc invocation that creates gran-doc-in-new-index to use the test's
new-index variable).
- Around line 151-154: The test reshard-index-with-granule-updates-test is
missing the s/only-with-real-database wrapper and may run without a real
DB/Elasticsearch; locate the deftest named
reshard-index-with-granule-updates-test and wrap its test body (the testing
"Creating and updating granule during reshard saves to both old and target
indexes" form and its contents) with (s/only-with-real-database ...) just like
reshard-index-error-test and reshard-index-success-test so the test only runs
when a real database is available.
🧹 Nitpick comments (6)
system-int-test/src/cmr/system_int_test/utils/url_helper.clj (1)

663-666: Docstring should be placed before the parameter list.

In Clojure, the docstring conventionally comes before the argument vector for it to be properly associated with the function metadata.

♻️ Suggested fix
 (defn indexer-get-index-sets-by-id-url
+  "Gets the index set by id from indexer"
   [id]
-  "Gets the index set by id from indexer"
   (format "http://localhost:%s/index-sets/%d" (transmit-config/indexer-port) id))
system-int-test/src/cmr/system_int_test/utils/index_util.clj (1)

192-214: Docstring placement and minor request configuration issues.

  1. The docstring should be placed before the parameter list for proper Clojure metadata association.
  2. The content-type header is redundant for GET requests since no body is being sent.
♻️ Suggested fix
 (defn get-index-set-by-id
+  "Gets index set by id in clojure map form.
+  Example of returned map:
+  {:index-set {
+    :granule {
+      :indexes [...]
+    }
+    :concepts {
+      :granule {
+        :small-collections '1_small_collections'
+        :C1234-PROV1 '1_c1234_prov1'
+      }
+    }
+  }"
   [id]
-  "Gets index set by id in clojure map form.
-  Example of returned map:
-  {:index-set {
-    :granule {
-      :indexes [...]
-    }
-    :concepts {
-      :granule {
-        :small-collections '1_small_collections'
-        :C1234-PROV1 '1_c1234_prov1'
-      }
-    }
-  }"
   (let [resp (client/get (url/indexer-get-index-sets-by-id-url id)
                          {:query-params {:format "json"}
-                          :headers {transmit-config/token-header (transmit-config/echo-system-token)
-                                    "content-type" "application/json"}
+                          :headers {transmit-config/token-header (transmit-config/echo-system-token)}
                           :connection-manager (s/conn-mgr)
                           :throw-exceptions false})
         index-sets (json/parse-string (:body resp) true)]
     index-sets))
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (2)

8-8: Unused import: error is imported but not used.

Only info is used in the new functions. Remove error from the referent list.

♻️ Suggested fix
-   [cmr.common.log :as log :refer [error info]]
+   [cmr.common.log :as log :refer [info]]

540-557: Consider failing fast on "FAILED" status.

The function will continue polling until timeout even if the reshard status becomes "FAILED". Consider adding an early exit condition for failure states to provide faster feedback in tests.

♻️ Suggested improvement
 (defn wait-for-reshard-complete
   [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
   (loop [attempt 1]
     (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name})
           status (get-in res [:reshard-status])]
       (cond
         (= status "COMPLETE")
         (info "Success: Resharding is COMPLETE.")

+        (= status "FAILED")
+        (throw (Exception. (str "Resharding failed with status: FAILED")))
+
         (>= attempt max-attempts)
         (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts.")))

         :else
         (do
           (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..."
                            attempt max-attempts status sleep-ms))
           (Thread/sleep sleep-ms)
           (recur (inc attempt)))))))
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (1)

4-4: Unused import: clojure.string is only used in commented-out code.

The clojure.string import is no longer needed since the test that used it is now commented out.

Proposed fix
 (:require
-  [clojure.string :as str]
   [clojure.test :refer [deftest is testing use-fixtures]]
indexer-app/src/cmr/indexer/services/index_set_service.clj (1)

133-134: Consider simplifying the condition.

The double negation with not can be replaced with more idiomatic Clojure.

Proposed simplification
-        finalized-concepts (if (and (not (= requested-updated-granule-concepts generated-updated-granule-concepts))
-                                    (not (nil? generated-updated-granule-concepts)))
+        finalized-concepts (if (and (not= requested-updated-granule-concepts generated-updated-granule-concepts)
+                                    (some? generated-updated-granule-concepts))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23f93e6 and 4992879.

📒 Files selected for processing (10)
  • indexer-app/int-test/cmr/indexer/test/utility.clj
  • indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj
  • indexer-app/src/cmr/indexer/data/index_set.clj
  • indexer-app/src/cmr/indexer/services/index_set_service.clj
  • system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
  • system-int-test/src/cmr/system_int_test/utils/elastic_util.clj
  • system-int-test/src/cmr/system_int_test/utils/index_util.clj
  • system-int-test/src/cmr/system_int_test/utils/url_helper.clj
  • system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
  • system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.

Applied to files:

  • system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
  • indexer-app/int-test/cmr/indexer/test/utility.clj
  • indexer-app/src/cmr/indexer/data/index_set.clj
  • indexer-app/src/cmr/indexer/services/index_set_service.clj
  • indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj
  • system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
  • system-int-test/src/cmr/system_int_test/utils/index_util.clj
🔇 Additional comments (13)
indexer-app/src/cmr/indexer/data/index_set.clj (1)

1179-1186: LGTM! Core reshard dual-write logic looks correct.

This change correctly implements the PR objective: when a collection has a single granule index that is being resharded, both the original and target indexes are returned. This ensures granule updates are written to both indexes during the reshard process, enabling safe rollback if the reshard fails.

The conditional structure is clear:

  • Single index → check for resharding → if resharding, include target
  • Multiple indexes (rebalancing) → return as-is (already handled by rebalancing logic)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (1)

559-575: LGTM with same consideration for FAILED status.

The implementation is consistent with wait-for-reshard-complete. The same suggestion about failing fast on "FAILED" status applies here.

indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj (2)

113-133: LGTM! Comprehensive test coverage for index-set update operations.

The updated test structure properly validates:

  1. Original index-set creation and state verification
  2. Index-set update via PUT and state verification
  3. PUT-based creation (upsert semantics)

Good use of expected value fixtures (util/expected-orig-index-set, util/expected-sample-index-set-after-update) for assertions.


134-150: Good edge case coverage for empty/minimal inputs.

These tests verify that both POST and PUT operations handle minimal index-set definitions gracefully, defaulting to a known empty state. This is important for defensive programming and ensures the system behaves predictably with incomplete inputs.

indexer-app/int-test/cmr/indexer/test/utility.clj (3)

90-141: LGTM - Well-structured test fixture for original index set.

The expected-orig-index-set provides clear documentation of expected state and covers all concept types including generics. The structure matches the index-set format used elsewhere in the codebase.


219-315: LGTM - Comprehensive expected state after update.

The expected-sample-index-set-after-update correctly captures the expected granule concepts including the new C6-PROV3 and C7-PROV4 indexes, and properly shows the updated collection mapping to COLL2-PROV1.


371-399: LGTM - Empty index-set fixture for edge case testing.

Good addition for testing the update-index-sets behavior with minimal/empty input. The :create-reason nil is intentional for this test case.

system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj (1)

17-17: LGTM - Properly migrated to new elastic-util helper.

The refactoring correctly uses es-util/doc-present? with the updated signature that removes the type name parameter (appropriate for Elasticsearch 7.x+ where mapping types were deprecated). The argument order (index-alias, concept-id, elastic-name) aligns with the new implementation in elastic_util.clj.

Also applies to: 29-31

system-int-test/src/cmr/system_int_test/utils/elastic_util.clj (2)

9-18: LGTM - Robust document presence check.

The implementation correctly uses the search API with ID query and validates both the hit count and the returned document ID. Using :throw-exceptions false is appropriate for this check operation.


30-37: LGTM - Simple and effective index existence check.

Checking HTTP status 200 is the correct approach for verifying index existence in Elasticsearch.

indexer-app/src/cmr/indexer/services/index_set_service.clj (3)

123-143: LGTM - Core fix for reshard dual-write behavior.

This logic correctly:

  1. Generates the default concept mappings based on index configurations
  2. Compares generated granule concepts with any explicitly requested granule concepts
  3. Merges them when they differ, preserving requested values (which point to the old index during reshard)

This ensures the index-set's concept mapping remains pointed to the old index until finalization, enabling dual-writes to both old and new indexes during reshard.


776-777: LGTM - Critical fix for reshard finalization.

This addition updates the :concepts map to associate the canonical index name with the new resharded target index. This ensures that after finalization, the index-set correctly points to the new index for future writes.


275-275: LGTM - Comment accurately reflects the rollback scope.

The updated comment correctly indicates that rollback now covers alias creation failures as well as index creation.

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 70.27027% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.02%. Comparing base (1f955f4) to head (e8b3075).

Files with missing lines Patch % Lines
...t/src/cmr/system_int_test/utils/bootstrap_util.clj 25.92% 19 Missing and 1 partial ⚠️
...app/src/cmr/indexer/services/index_set_service.clj 87.50% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2370   +/-   ##
=======================================
  Coverage   58.01%   58.02%           
=======================================
  Files        1063     1064    +1     
  Lines       72765    72827   +62     
  Branches     2113     2112    -1     
=======================================
+ Hits        42215    42256   +41     
- Misses      28580    28601   +21     
  Partials     1970     1970           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@eereiter eereiter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the found issues.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj:
- Line 8: Remove the unused import by deleting `error` from the refer list in
the cmr.common.log import if you don't intend to log errors, or alternatively
invoke `error` inside the timeout branches of the new wait functions (e.g., the
wait-for-* functions added) before throwing an exception so timeouts are logged;
update the import to `[cmr.common.log :as log :refer [info]]` if removing, or
add a call like `(error (str "Timeout: Resharding failed to complete after "
attempt " attempts."))` in the timeout path(s) and keep the refer list as
`[error info]`.

In @system-int-test/src/cmr/system_int_test/utils/elastic_util.clj:
- Around line 9-18: In doc-present? fix the Elasticsearch path and encode the
doc id: change the client/get call’s path to use the search endpoint without the
deprecated _doc segment (i.e., use /{index}/_search) and URL-encode the doc-id
before interpolating into the q parameter (e.g., use
java.net.URLEncoder.encode(doc-id, "UTF-8") or a URI builder) so special
characters are safely escaped; update the client/get invocation in the
doc-present? function to use the corrected path and encoded id while preserving
the existing connection-manager and error handling.

In @system-int-test/src/cmr/system_int_test/utils/url_helper.clj:
- Around line 663-666: The function indexer-get-index-sets-by-id-url uses a %d
formatter which differs from the transmit-lib pattern; update the format string
to use %s and pass (str id) instead of id so it matches
rebalance-collection-url, reshard-index-url, and get-index-set patterns in
transmit-lib; modify the body of indexer-get-index-sets-by-id-url to call
(format "http://localhost:%s/index-sets/%s" (transmit-config/indexer-port) (str
id)).

In @system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj:
- Around line 151-154: Wrap the reshard-index-with-granule-updates-test with
s/only-with-real-database like the other tests (reshard-index-error-test and
reshard-index-success-test) because it calls es-util/get-doc,
es-util/index-exists?, and index/get-index-set-by-id; add the opening
s/only-with-real-database wrapper around the test body and ensure you add the
missing closing parenthesis at the end of the test (after the current end of the
test) so the form is properly balanced.
🧹 Nitpick comments (4)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (2)

540-557: Consider returning the final status for test assertions.

The implementation correctly polls for reshard completion. However, returning nil on success means callers cannot assert on the final status. Consider returning the result map on success.

♻️ Optional: Return status on success
       (cond
         (= status "COMPLETE")
-        (info "Success: Resharding is COMPLETE.")
+        (do
+          (info "Success: Resharding is COMPLETE.")
+          res)
 
         (>= attempt max-attempts)

559-575: LGTM with optional DRY suggestion.

The implementation correctly polls for rebalance completion. Both wait functions share nearly identical polling logic and could optionally be refactored to a common helper to reduce duplication.

♻️ Optional: Extract common polling logic
(defn- poll-until-complete
  "Generic polling helper that waits for a status function to return COMPLETE."
  [status-fn status-key operation-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
  (loop [attempt 1]
    (let [res (status-fn)
          status (get res status-key)]
      (cond
        (= status "COMPLETE")
        (do (info (str "Success: " operation-name " is COMPLETE.")) res)

        (>= attempt max-attempts)
        (throw (Exception. (str "Timeout: " operation-name " failed to complete after " attempt " attempts.")))

        :else
        (do
          (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms))
          (Thread/sleep sleep-ms)
          (recur (inc attempt)))))))

(defn wait-for-reshard-complete
  [coll-index-name elastic-name opts]
  (poll-until-complete 
    #(get-reshard-status coll-index-name {:elastic-name elastic-name})
    :reshard-status "Resharding" opts))

(defn wait-for-rebalance-to-complete
  [collection opts]
  (poll-until-complete 
    #(get-rebalance-status (:concept-id collection))
    :rebalancing-status "Rebalance" opts))
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (2)

4-4: Unused import: clojure.string

This import is only referenced in commented-out code (line 236). Consider removing it to avoid dead code warnings.

Suggested fix
 (:require
-   [clojure.string :as string]
    [clojure.test :refer [deftest is testing use-fixtures]]

218-223: Consider tracking disabled test differently

The explanation for why this test is commented out is helpful, but ~95 lines of commented code adds maintenance burden. Consider either:

  1. Converting to a :pending or :skip test annotation if your test framework supports it
  2. Removing the commented code and tracking it in a GitHub issue with the explanation
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4992879 and 1fc3ebc.

📒 Files selected for processing (5)
  • system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
  • system-int-test/src/cmr/system_int_test/utils/elastic_util.clj
  • system-int-test/src/cmr/system_int_test/utils/url_helper.clj
  • system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
  • system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.

Applied to files:

  • system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
  • system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
🔇 Additional comments (5)
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (1)

155-216: LGTM - Test logic is comprehensive

The test correctly validates the PR objective: during reshard, granule creates/updates are persisted to both old and new indexes, the index-set mapping remains unchanged until finalization, and cleanup occurs properly after finalization.

system-int-test/src/cmr/system_int_test/utils/elastic_util.clj (2)

20-28: LGTM - Consider documenting the not-found behavior.

The document retrieval path /{index}/_doc/{id} is correct for ES 7.x+. The function will return the full response body including found: false when the document doesn't exist, which callers should handle. This is appropriate for test utilities.


30-37: LGTM!

The implementation correctly checks index existence by verifying a 200 status. Using GET is acceptable for integration tests; a HEAD request would be slightly more efficient but is not necessary here.

system-int-test/test/cmr/system_int_test/ingest/misc/deleted_granules_index_test.clj (2)

16-16: LGTM!

The new namespace require follows the existing conventions in this file.


24-30: LGTM!

The migration to es-util/doc-present? is clean. The removal of the type-name argument aligns correctly with the ES 7.x deprecation of mapping types, and the argument order matches the new utility signature.

Comment on lines +9 to +18
(defn doc-present?
"If doc is present return true, otherwise return false"
[index-name doc-id elastic-name]
(let [response (client/get
(format "%s/%s/_doc/_search?q=_id:%s" (url/elastic-root elastic-name) index-name doc-id)
{:throw-exceptions false
:connection-manager (s/conn-mgr)})
body (json/decode (:body response) true)]
(and (= 1 (get-in body [:hits :total :value]))
(= doc-id (get-in body [:hits :hits 0 :_id])))))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for existing Elasticsearch search patterns in the codebase
rg -n --type clojure '_search' -C 2

Repository: nasa/Common-Metadata-Repository

Length of output: 7237


🌐 Web query:

Elasticsearch 7.x search API endpoint path format

💡 Result:

Elasticsearch 7.x Search API endpoint paths:

  • GET or POST /_search — cluster-wide search. (elastic.co) [1]
  • GET or POST /{index}/_search — search a specific index ( {index} may be one name, a comma-separated list, wildcard, or alias ). (elastic.co) [2]

Citations:


Fix Elasticsearch search API path and add URL encoding for doc-id.

Two issues:

  1. The path /{index}/_doc/_search is incorrect for Elasticsearch 7.x. The search endpoint is /{index}/_search without the _doc segment. The _doc type was deprecated and is not part of the search API path.

  2. The doc-id should be URL-encoded in the query string to handle special characters safely. Use URLEncoder.encode() or a URI-building utility rather than direct string interpolation into the query parameter.

🤖 Prompt for AI Agents
In @system-int-test/src/cmr/system_int_test/utils/elastic_util.clj around lines
9 - 18, In doc-present? fix the Elasticsearch path and encode the doc id: change
the client/get call’s path to use the search endpoint without the deprecated
_doc segment (i.e., use /{index}/_search) and URL-encode the doc-id before
interpolating into the q parameter (e.g., use java.net.URLEncoder.encode(doc-id,
"UTF-8") or a URI builder) so special characters are safely escaped; update the
client/get invocation in the doc-present? function to use the corrected path and
encoded id while preserving the existing connection-manager and error handling.

Comment on lines +151 to +154
(deftest reshard-index-with-granule-updates-test
;; When a granule is upserted during the reshard process, it should save to both indexes and the index set
;; should not update the index mapping until the reshard is finalized
(testing "Creating and updating granule during reshard saves to both old and target indexes"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing s/only-with-real-database wrapper

This test interacts directly with Elasticsearch via es-util/get-doc, es-util/index-exists?, and index/get-index-set-by-id, but unlike the other tests in this file (reshard-index-error-test at line 21 and reshard-index-success-test at line 98), it's missing the s/only-with-real-database wrapper.

Suggested fix
 (deftest reshard-index-with-granule-updates-test
+  (s/only-with-real-database
   ;; When a granule is upserted during the reshard process, it should save to both indexes and the index set
   ;; should not update the index mapping until the reshard is finalized
   (testing "Creating and updating granule during reshard saves to both old and target indexes"
     (let [gran-elastic-name "gran-elastic"

And add the closing parenthesis at the end of the test (after line 216):

       (is (not (es-util/index-exists? small-collections-index-name gran-elastic-name))))))
+  )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(deftest reshard-index-with-granule-updates-test
;; When a granule is upserted during the reshard process, it should save to both indexes and the index set
;; should not update the index mapping until the reshard is finalized
(testing "Creating and updating granule during reshard saves to both old and target indexes"
(deftest reshard-index-with-granule-updates-test
(s/only-with-real-database
;; When a granule is upserted during the reshard process, it should save to both indexes and the index set
;; should not update the index mapping until the reshard is finalized
(testing "Creating and updating granule during reshard saves to both old and target indexes"
(let [gran-elastic-name "gran-elastic"
;; ... rest of test content from lines 155-215 remains unchanged ...
]
;; ... test assertions ...
(is (not (es-util/index-exists? small-collections-index-name gran-elastic-name)))))))
)
🤖 Prompt for AI Agents
In @system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
around lines 151 - 154, Wrap the reshard-index-with-granule-updates-test with
s/only-with-real-database like the other tests (reshard-index-error-test and
reshard-index-success-test) because it calls es-util/get-doc,
es-util/index-exists?, and index/get-index-set-by-id; add the opening
s/only-with-real-database wrapper around the test body and ensure you add the
missing closing parenthesis at the end of the test (after the current end of the
test) so the form is properly balanced.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj`:
- Around line 540-557: The wait-for-reshard-complete function lacks a docstring
and does not validate the HTTP response status from get-reshard-status before
reading :reshard-status, causing retries to mask server errors; add a docstring
to wait-for-reshard-complete, check that the response status (e.g., res's
:status or :http-status) equals 200 before inspecting :reshard-status, and if it
is non-200 log an error with the response details and throw (or return)
immediately instead of continuing retries; keep existing retry behavior only for
valid 200 responses and preserve the max-attempts/sleep-ms parameters.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc3ebc and f97cc4d.

📒 Files selected for processing (2)
  • system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
  • system-int-test/src/cmr/system_int_test/utils/url_helper.clj
🚧 Files skipped from review as they are similar to previous changes (1)
  • system-int-test/src/cmr/system_int_test/utils/url_helper.clj
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.

Applied to files:

  • system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj
🔇 Additional comments (1)
system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj (1)

8-8: LGTM!

The import is correctly updated to only include info, which is used by both new wait functions.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +540 to +557
(defn wait-for-reshard-complete
[coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
(loop [attempt 1]
(let [res (get-reshard-status coll-index-name {:elastic-name elastic-name})
status (get-in res [:reshard-status])]
(cond
(= status "COMPLETE")
(info "Success: Resharding is COMPLETE.")

(>= attempt max-attempts)
(throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts.")))

:else
(do
(info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..."
attempt max-attempts status sleep-ms))
(Thread/sleep sleep-ms)
(recur (inc attempt)))))))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add HTTP status validation and docstring.

The function doesn't validate that the HTTP response status is 200 before checking :reshard-status. If the endpoint returns an error (e.g., 500), the loop will retry until timeout, masking the actual issue. Also, a docstring would align with other functions in this file.

Suggested fix
+(defn wait-for-reshard-complete
+  "Polls reshard status until COMPLETE or timeout. Throws on timeout."
+  [coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
+  (loop [attempt 1]
+    (let [res (get-reshard-status coll-index-name {:elastic-name elastic-name})
+          http-status (:status res)
+          status (get res :reshard-status)]
+      (cond
+        (not= 200 http-status)
+        (throw (Exception. (str "Reshard status check failed with HTTP status: " http-status)))
+
+        (= status "COMPLETE")
+        (info "Success: Resharding is COMPLETE.")
+
+        (>= attempt max-attempts)
+        (throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts.")))
+
+        :else
+        (do
+          (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..."
+                           attempt max-attempts status sleep-ms))
+          (Thread/sleep sleep-ms)
+          (recur (inc attempt)))))))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(defn wait-for-reshard-complete
[coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
(loop [attempt 1]
(let [res (get-reshard-status coll-index-name {:elastic-name elastic-name})
status (get-in res [:reshard-status])]
(cond
(= status "COMPLETE")
(info "Success: Resharding is COMPLETE.")
(>= attempt max-attempts)
(throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts.")))
:else
(do
(info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..."
attempt max-attempts status sleep-ms))
(Thread/sleep sleep-ms)
(recur (inc attempt)))))))
(defn wait-for-reshard-complete
"Polls reshard status until COMPLETE or timeout. Throws on timeout."
[coll-index-name elastic-name {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
(loop [attempt 1]
(let [res (get-reshard-status coll-index-name {:elastic-name elastic-name})
http-status (:status res)
status (get res :reshard-status)]
(cond
(not= 200 http-status)
(throw (Exception. (str "Reshard status check failed with HTTP status: " http-status)))
(= status "COMPLETE")
(info "Success: Resharding is COMPLETE.")
(>= attempt max-attempts)
(throw (Exception. (str "Timeout: Resharding failed to complete after " attempt " attempts.")))
:else
(do
(info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..."
attempt max-attempts status sleep-ms))
(Thread/sleep sleep-ms)
(recur (inc attempt)))))))
🤖 Prompt for AI Agents
In `@system-int-test/src/cmr/system_int_test/utils/bootstrap_util.clj` around
lines 540 - 557, The wait-for-reshard-complete function lacks a docstring and
does not validate the HTTP response status from get-reshard-status before
reading :reshard-status, causing retries to mask server errors; add a docstring
to wait-for-reshard-complete, check that the response status (e.g., res's
:status or :http-status) equals 200 before inspecting :reshard-status, and if it
is non-200 log an error with the response details and throw (or return)
immediately instead of continuing retries; keep existing retry behavior only for
valid 200 responses and preserve the max-attempts/sleep-ms parameters.

Comment on lines +559 to +575
(defn wait-for-rebalance-to-complete
[collection {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
(loop [attempt 1]
(let [res (get-rebalance-status (:concept-id collection))
status (get-in res [:rebalancing-status])]
(cond
(= status "COMPLETE")
(info "Success: Rebalance is COMPLETE.")

(>= attempt max-attempts)
(throw (Exception. (str "Timeout: Rebalance failed to complete after " attempt " attempts.")))

:else
(do
(info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms))
(Thread/sleep sleep-ms)
(recur (inc attempt)))))))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add HTTP status validation, docstring, and trailing newline.

Same concerns as the previous function: no HTTP status validation before checking :rebalancing-status, and missing docstring. Additionally, the file is missing a trailing newline.

Suggested fix
+(defn wait-for-rebalance-to-complete
+  "Polls rebalance status until COMPLETE or timeout. Throws on timeout."
+  [collection {:keys [max-attempts sleep-ms] :or {max-attempts 50 sleep-ms 1000}}]
+  (loop [attempt 1]
+    (let [res (get-rebalance-status (:concept-id collection))
+          http-status (:status res)
+          status (get res :rebalancing-status)]
+      (cond
+        (not= 200 http-status)
+        (throw (Exception. (str "Rebalance status check failed with HTTP status: " http-status)))
+
+        (= status "COMPLETE")
+        (info "Success: Rebalance is COMPLETE.")
+
+        (>= attempt max-attempts)
+        (throw (Exception. (str "Timeout: Rebalance failed to complete after " attempt " attempts.")))
+
+        :else
+        (do
+          (info (format "[Attempt %d/%d] Status: %s. Retrying in %dms..." attempt max-attempts status sleep-ms))
+          (Thread/sleep sleep-ms)
+          (recur (inc attempt)))))))
+

Note: The trailing + line represents the missing newline at end of file.

@jmaeng72 jmaeng72 merged commit 57b33e0 into master Jan 14, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants