-
Notifications
You must be signed in to change notification settings - Fork 809
SOLR-17725: CoreAdmin API to upgrade an index in-place to facilitate Solr upgrade across major versions #3903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
First (half-baked) draft. WIP. |
…ion for custom update.chain
3085bea to
daaaecb
Compare
|
Looking good. Basic (manual) tests with /admin/cores?action=UPGRADECOREINDEX working fine. Will publish response with segment status before and after index upgrade. Also need to add tests. |
… has been renamed. It's being used in this feature..
|
@dsmiley @broustant @janhoy This is ready for review. Would appreciate your inputs. Thanks! |
…keep forbiddenApisTest happy
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
Thanks for working on this!
@broustant you may find some of this rather familiar from index encryption
solr/core/src/java/org/apache/solr/handler/admin/api/UpgradeCoreIndex.java
Show resolved
Hide resolved
|
|
||
| @Schema(description = "Request ID to track this action which will be processed asynchronously.") | ||
| @JsonProperty | ||
| public String async; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @gerlowskija might review the V2 API spect.
But what I notice here is this async --- I'm suspicious as I figure it'd be handled in a more general way without each command needing to redundantly specify it, which is basically all commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this with only V1 api so far, but have tried to follow the pattern for V2 as well. There might be gaps which @gerlowskija could point out (?). In case of v1, the async is handled by the CoreAdminHandler.handleRequestBody() itself by adding to CoreAdminAsyncTracker https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/admin/CoreAdminHandler.java#L231
Right now unlike the SolrJerseyResponse, the *RequestBody classes don't inherit from a common parent so each one seems to introduce the "async" by itself. Perhaps a case for refactoring and pushing "async" into the parent class? An action item for another PR?
| coreName, | ||
| requestBody.async, | ||
| "upgrade-index", | ||
| () -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a stylistic point but this lambda is way too big; please use another technique like define a method, and call it with a one-liner lambda.
| () -> { | ||
| try (SolrCore core = coreContainer.getCore(coreName)) { | ||
|
|
||
| log.info("Received UPGRADECOREINDEX request for core: {}", coreName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging at request level seems like a framework infrastructure level concern (probably already logged), not one unique to this particular command
| int numSegmentsEligibleForUpgrade = 0, numSegmentsUpgraded = 0; | ||
| try { | ||
| iwRef = core.getSolrCoreState().getIndexWriter(core); | ||
| if (iwRef != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this can't be null?!
| if (iwRef != null) { | ||
| IndexWriter iw = iwRef.get(); | ||
| if (iw != null && originalMergePolicy != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again; null checks needless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iwRef null-check might be needed in case there is an exception getting the IndexWriter in try for whatever reason (eg:AlreadyClosedException). Removed the null check on "iw".
| if (resolvedChain != null) { | ||
| return resolvedChain; | ||
| } else { | ||
| log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should throw an error (and not log)
| // Try to find chain configured in /update handler | ||
| String updateChainName = null; | ||
| SolrRequestHandler reqHandler = core.getRequestHandler("/update"); | ||
|
|
||
| NamedList initArgs = ((RequestHandlerBase) reqHandler).getInitArgs(); | ||
|
|
||
| if (initArgs != null) { | ||
| // Check invariants first | ||
| Object invariants = initArgs.get("invariants"); | ||
| if (invariants instanceof NamedList) { | ||
| updateChainName = (String) ((NamedList) invariants).get(UpdateParams.UPDATE_CHAIN); | ||
| } | ||
|
|
||
| // Check defaults if not found in invariants | ||
| if (updateChainName == null) { | ||
| Object defaults = initArgs.get("defaults"); | ||
| if (defaults instanceof NamedList) { | ||
| updateChainName = (String) ((NamedList) defaults).get(UpdateParams.UPDATE_CHAIN); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're trying too hard to find the default chain. If it's actually this hard, then this code should live elsewhere and we just call it. But I'm still suspicious. If it wasn't so late I'd go look.
|
|
||
| private boolean isIndexUpgraded(SolrCore core) throws IOException { | ||
|
|
||
| try (FSDirectory dir = FSDirectory.open(Path.of(core.getIndexDir())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah woah... lets instead get the Directory from the factory from the core. See SolrIndexSplitter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
|
|
||
| private void doCommit(SolrCore core) throws IOException { | ||
| try (LocalSolrQueryRequest req = new LocalSolrQueryRequest(core, new ModifiableSolrParams())) { | ||
| CommitUpdateCommand cmd = new CommitUpdateCommand(req, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a little EOL comment on what "false" means
|
@dsmiley Thanks for an initial review. I am working on addressing the review comments. |
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** | ||
| * Implements the UPGRADECOREINDEX CoreAdmin action, which upgrades an existing core's index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your time writing this but a user isn't going to read this, only someone maintaing this code will. ref-guide documentation is for our users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, and I did debate this. But eventually decided to be more verbose and spell out the high-level algorithm for anyone looking at the code. In my opinion, it might not hurt to keep certain implementation details in their current form in the javadoc(?). Edited to leave out certain details/notes more suitable for ref-guide.
|
|
||
| LeafReader leafReader = FilterLeafReader.unwrap(leafReaderContext.reader()); | ||
| SegmentReader segmentReader = (SegmentReader) leafReader; | ||
| final String segmentName = segmentReader.getSegmentName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is the only reason why you casted leafReader to a segmentReader. And this variable is only used in logging. It'd be a shame to limit the functionality of this tool to require full unwrapping of the reader. Just log the leafReader instead? Or add a utility method used in logging to get a "pretty" name for the leaf reader that attempts to get the name by unwrapping.
| String coreName = core.getName(); | ||
| IndexSchema indexSchema = core.getLatestSchema(); | ||
|
|
||
| LeafReader leafReader = FilterLeafReader.unwrap(leafReaderContext.reader()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not unwrap; it limited the range of uses of this tool. For example, maybe we want UninvertingReader to synthesize docValues from the index!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now get the leaves from the raw reader (searcher.getRawReader()), so I don't think it would have the ExitableDirectoryReader or UninvertingReader wrappers anymore. Got rid of unwrapping anyway to simplify the code.
| Bits bits = segmentReader.getLiveDocs(); | ||
| SolrInputDocument solrDoc = null; | ||
| UpdateRequestProcessor processor = null; | ||
| LocalSolrQueryRequest solrRequest = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this into try-with-resources to avoid the explicit close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored processSegment() to avoid belt-and-suspenders exception handling
| LeafReader leafReader = FilterLeafReader.unwrap(leafReaderContext.reader()); | ||
| SegmentReader segmentReader = (SegmentReader) leafReader; | ||
| final String segmentName = segmentReader.getSegmentName(); | ||
| Bits bits = segmentReader.getLiveDocs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of this variable is terrible; it's like calling a String as string. The only name for this is liveDocs
| /* | ||
| * Convert a lucene Document to a SolrInputDocument | ||
| */ | ||
| protected SolrInputDocument toSolrInputDocument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks duplicative of similar code in RealtimeGetComponent. Right? Can we dedupulicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in RealTImeGetComponent creates SolrInputDocument out of SolrDocument, whereas here I am directly creating SolrInputDocument from Lucene Document.
Looking at this though, I am revisiting if this code will work for nested documents. I may have to make additional modifications to accommodate that. We don't use them, so I have limited knowledge about the internal heuristics of nested docs. Reading up more to decide whether further changes are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay to punt on this. Detect that the schema advertises support for nested docs -- and then fail.
Doing this properly requires reading the linear/adjacent series of docs and reconstruct the hierarchy. This way, the URPs "see" what they are supposed to see -- the root of the hierarchy. And Solr will do it's behavior of linearalizing them at the end and adding them in an atomic fashion, which is critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I have added the change to bail out if nested docs are detected. I went for detecting actual presence of nested docs instead of schema.isUsableForChildDocs() since sometimes people derive a schema out of examples and it may contain _root_ field even though they don't have any nested docs really.
Also added a test for the same.
| } | ||
|
|
||
| @AfterClass | ||
| public static void afterClass() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solr's test infrastructure resets/clears all system properties at the end of each test. So you can remove this and stop bothering saving the directory. @epugh is at present removing all trace of such from tests you might have gotten inspiration from
| import org.junit.BeforeClass; | ||
| import org.junit.Test; | ||
|
|
||
| public class UpgradeCoreIndexActionTest extends SolrTestCaseJ4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please consider using EmbeddedSolrServerTestRule with SolrTestCase base class (not STCJ4) instead? I'm slowly trying to move our tests to be SolrClient based and away from STCJ4. Your test hear seems ripe for it.
|
|
||
| final Set<String> segmentsBeforeUpgrade = listSegmentNames(core); | ||
|
|
||
| CoreAdminHandler admin = new CoreAdminHandler(h.getCoreContainer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is invasive... like why can't you send a normal looking request into Solr in a SolrJ/SolrClient way without needing to directly manipulate Solr server classes like this. This gets to my suggestion at the class declaration about using a SolrClient instead -- encourages client side approach. Only reach into Solr internals when actually needed to accomplish a goal.
| } | ||
|
|
||
| private Set<String> listSegmentNames(SolrCore core) { | ||
| RefCounted<org.apache.solr.search.SolrIndexSearcher> searcherRef = core.getSearcher(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you seen core.withSearcher? More elegant.
…Throw error when updateChain not found
… avoid belt-and-suspenders exception handling
|
Thanks for the thoughtful review comments. Addressed many of them. Still working on the rest. Shall update soon. |
…(); cleanup test to not require FSDirectory anymore; make javadoc more concise; remove extra block in parenthesis in performUpgrade()
dsmiley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your improvements
| RefCounted<SolrIndexSearcher> searcherRef = core.getSearcher(); | ||
| try { | ||
| List<LeafReaderContext> leafContexts = searcherRef.get().getRawReader().leaves(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine it's very useful to benefit from UninvertingReader but you chose getRawReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a hint from your earlier comment :)
think the code here could learn some things from long lived code in SolrIndexSplitter. It uses searcher.getRawReader().leaves() to bypass ExitableDirectoryReader and uninverting.
But I see your point. UninvertingReader would give the ability to even retain indexed-only fields (stored=false, docValues=false). I like it! Might add that support later. Leaving the door open for now by switching to searcher.getIndexReader() instead. thanks!
| private UpgradeCoreIndexResponse performUpgrade( | ||
| String coreName, UpgradeCoreIndexRequestBody requestBody, UpgradeCoreIndexResponse response) { | ||
|
|
||
| try (SolrCore core = coreContainer.getCore(coreName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could drop yet another level of indentation by extracting another method
|
|
||
| docFetcher.decorateDocValueFields( | ||
| solrDoc, leafReaderContext.docBase + luceneDocId, nonStoredDVFields, dvICache); | ||
| solrDoc.removeField("_version_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: suspicious removal; at least needs a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll get back on this. I roughly remember running into some error in some scenario with _version_ intact. Can't remember if it was with specific field attributes. Tests pass fine with _version_ intact. Will try to recall again in the morning and address this either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this. I conflated this with another scenario where I was forwarding a doc from one core to another one with the same id. And that throws an error if it contains the _version_ field since the doc doesn't exist in the other core ("version conflict for id_blah expected=1854703490865037312 actual=-1").
In this case, since it is on the same core and the document already exists (obviously), it should be retained in order to prevent possible race condition during reindexing in case there is a concurrent update that comes in. silly silly silly oversight!
Thanks for flagging this and for the reminder!
…o benefit from UninvertibleReader wrapper
…adeCoreIndex.performUpgrade() further to reduce indentation levels
|
@dsmiley The review comments have been addressed. Can you please take another look? Thanks. |
https://issues.apache.org/jira/browse/SOLR-17725
Description
Provide a /admin/cores api (action=UPGRADECOREINDEX) to upgrade older lucene segments to latest version (by targeted reindexing). Calling this REST endpoint on an index created in version X-1 (assuming you are on Solr version X) would reindex documents in any segments of the older version (X-1) so that they are re-written in version X. This would help prepare the index for when Solr is upgraded to X+1 without having to recreate the index from source (as is the requirement today).
Solution
This API helps to upgrade an index where all fields are either stored or docValues enabled or copy field target.
Here's the algorithm:
Set the merge policy as LatestVersionMergePolicy (which is a filter merge policy that delegates to whichever policy is actual defined in solrconfig or the TieredMergePolicy by default). This policy filters out any segments with minVersion=X-1 and only allows the latest version (X) segments to participate in merges. This ensures that the older version segments don't merge and "pollute" the index over the course of the reindexing.
Iterate over the segments and for any segment which is a pure X-1 version segment (result of flush; minVersion=X-1, version=X-1) or a result of merge between version X-1 and X segments (minVersion=X-1, version=X), reindex all live documents in the segment.
commit and validate that all segments in the index have been converted to version X.
Reset the merge policy back to the original merge policy.
This works because after the change in apache/lucene#14607 (and the backport to 10x) opening an index no longer cares about which version the index was originally created in. Lucene just needs all of the segments to be of version X or later in order to be able to open the index in version X+1 (even though the index could have been originally created in version X-1)
[AI Coding Tools Disclosure: Only use of AI coding tools was in the tests (UpgradeCoreIndexActionTest.java). Claude Code and OpenAI Codex were used to generate, review, iterate and fix the tests mentioned in the "Tests" section. The same were also manually reviewed thereafter for correctness. ]
Tests
seg1 - pure 9x
seg2 - pure 10x
seg3 - minVersion 9x, version 10x (result of merge between 9x and 10x segements)
seg1 and seg3 get reindexed into new 10x segment, and get deleted. seg2 remains untouched (as expected).
DocValues-only fields are preserved during reindexing (this is tested specifically since we are reading docs from segments and decorating the SolrInputDocument with docValues thereafter)
older segments get deleted by the end of the process
async mode works fine. Tested by calling /admin/cores?action=UPGRADECOREINDEX&async=request_id. And later calling action=REQUESTSTATUS&requestid=request_id
For an index where all segments belong to the latest major version, calling the UPGRADECOREINDEX API is a no-op, and returns "upgradeStatus":"NO_UPGRADE_NEEDED" in the response.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.