You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing a content import without the identifier column, the key fields are no longer being used to tie content together when importing a multilingual file to add language versions. For example.
Create new English content using a CSV import
Use a second CSV to generate the French and Spanish versions
Result: French and Spanish versions create new contentlets despite matching key fields with existing content.
Steps to Reproduce
languageVersion.mp4
Acceptance Criteria
Ensure that Key Fields are being used correctly when identifier column is missing on importing a multi-lingual file.
Let me look at this with fresh eyes and trace the exact code path for your CSV scenario — specifically what happens when isMultilingual = true comes in from the languageCode/countryCode columns.
Found something critical — isMultilingual is never set in the job processor's ImmutableImportFileParams builder. Let me find the default value.
Found the root cause. isMultilingual defaults to false in ImportFileParams and the job processor never sets it. Let me also check what searchIndex does with the Lucene query to confirm the _dotraw case issue.
Now I have the complete picture. There are actually two separate bugs working together.
Bug 1: isMultilingual is never set in the job processor
[ImportContentletsProcessor.java:427-444](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280) builds the params but never calls .isMultilingual(true), so it always defaults to false ([ImportFileParams.java:62](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280)):
finalImmutableImportFileParamsimportFileParams = ImmutableImportFileParams.builder()
// ...
.languageCodeHeaderColumn(headerInfo.languageCodeColumn) // correctly set to 0
.countryCodeHeaderColumn(headerInfo.countryCodeColumn) // correctly set to 1// .isMultilingual(true) ← NEVER CALLED
.build();
With isMultilingual = false, languageCode/countryCode are not treated as special headers. They show as invalid headers but the import continues. The language still resolves correctly per row via processLanguage because languageCodeHeaderColumn IS set. So this bug is bad but not the killer blow on its own.
Bug 2: _dotraw query uses original case, ES stores lowercase — guaranteed miss
In [ESMappingAPIImpl.java:459-460](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280), every string value is lowercased before indexing:
if (lowerCaseValueinstanceofString) {
lowerCaseValue = ((String) lowerCaseValue).toLowerCase(); // "Testing2" → "testing2"
}
mapLowered.put(lowerCaseKey + DOTRAW, lowerCaseValue); // stored as "testing2"
But [ImportUtil.java:2987](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280) queries with the raw CSV value:
.append(escapeLuceneSpecialCharacter(processedValue)) // "Testing2" — original case
Since _dotraw is a keyword field in ES, matches are case-sensitive. Testing2 ≠ testing2 — the search always returns empty for any mixed-case value. This is why no key-field match ever succeeds: not for Title, not for describer, not for orderPosition. Any value with an uppercase letter is a guaranteed miss.
The combined effect
With both bugs present: isMultilingual stays false, the language-constrained initial search returns empty (correct — the contentlet is in English, not French), the fallback runs but the _dotraw search also returns empty (due to case mismatch), so contentlets is always empty, and a brand new contentlet is always created. No identifier = always new. Your read of the situation is correct.
Fixes
Fix 1 — [ImportContentletsProcessor.java](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280) line ~433, add:
.isMultilingual(language == null)
Fix 2 — [ImportUtil.java:2987](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280), lowercase the value before appending to the query:
Fix 2 is the root cause. Fix 1 is also needed for correctness (language headers not being recognized means multilingual validation warnings appear and the header logic is confused), but even with Fix 1 alone, the _dotraw case mismatch would still cause the search to return nothing.
Problem Statement
When doing a content import without the identifier column, the key fields are no longer being used to tie content together when importing a multilingual file to add language versions. For example.
Steps to Reproduce
languageVersion.mp4
Acceptance Criteria
Ensure that Key Fields are being used correctly when identifier column is missing on importing a multi-lingual file.
dotCMS Version
26.05.11-01_dd6280b
Severity
Medium - Some functionality impacted
Links
https://dotcms.freshdesk.com/a/tickets/37006
Some Claude findings.
Let me look at this with fresh eyes and trace the exact code path for your CSV scenario — specifically what happens when
isMultilingual = truecomes in from thelanguageCode/countryCodecolumns.Found something critical —
isMultilingualis never set in the job processor'sImmutableImportFileParamsbuilder. Let me find the default value.Found the root cause.
isMultilingualdefaults tofalseinImportFileParamsand the job processor never sets it. Let me also check whatsearchIndexdoes with the Lucene query to confirm the_dotrawcase issue.Now I have the complete picture. There are actually two separate bugs working together.
Bug 1:
isMultilingualis never set in the job processor[ImportContentletsProcessor.java:427-444](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280)builds the params but never calls.isMultilingual(true), so it always defaults tofalse([ImportFileParams.java:62](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280)):With
isMultilingual = false,languageCode/countryCodeare not treated as special headers. They show as invalid headers but the import continues. The language still resolves correctly per row viaprocessLanguagebecauselanguageCodeHeaderColumnIS set. So this bug is bad but not the killer blow on its own.Bug 2:
_dotrawquery uses original case, ES stores lowercase — guaranteed missIn
[ESMappingAPIImpl.java:459-460](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280), every string value is lowercased before indexing:But
[ImportUtil.java:2987](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280)queries with the raw CSV value:Since
_dotrawis akeywordfield in ES, matches are case-sensitive.Testing2 ≠ testing2— the search always returns empty for any mixed-case value. This is why no key-field match ever succeeds: not for Title, not for describer, not for orderPosition. Any value with an uppercase letter is a guaranteed miss.The combined effect
With both bugs present:
isMultilingualstaysfalse, the language-constrained initial search returns empty (correct — the contentlet is in English, not French), the fallback runs but the_dotrawsearch also returns empty (due to case mismatch), socontentletsis always empty, and a brand new contentlet is always created. No identifier = always new. Your read of the situation is correct.Fixes
Fix 1 —
[ImportContentletsProcessor.java](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280)line ~433, add:Fix 2 —
[ImportUtil.java:2987](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280), lowercase the value before appending to the query:Fix 2 is the root cause. Fix 1 is also needed for correctness (language headers not being recognized means multilingual validation warnings appear and the header logic is confused), but even with Fix 1 alone, the
_dotrawcase mismatch would still cause the search to return nothing.