Skip to content

[DEFECT] Key fields aren't being used on content Import #35790

@nathan-hildebrandt

Description

@nathan-hildebrandt

Problem Statement

When doing a content import without the identifier column, the key fields are no longer being used to tie content together when importing a multilingual file to add language versions. For example.

  1. Create new English content using a CSV import
  2. Use a second CSV to generate the French and Spanish versions
  3. Result: French and Spanish versions create new contentlets despite matching key fields with existing content.

Steps to Reproduce

languageVersion.mp4

Acceptance Criteria

Ensure that Key Fields are being used correctly when identifier column is missing on importing a multi-lingual file.

dotCMS Version

26.05.11-01_dd6280b

Severity

Medium - Some functionality impacted

Links

https://dotcms.freshdesk.com/a/tickets/37006

Some Claude findings.

Let me look at this with fresh eyes and trace the exact code path for your CSV scenario — specifically what happens when isMultilingual = true comes in from the languageCode/countryCode columns.

Found something critical — isMultilingual is never set in the job processor's ImmutableImportFileParams builder. Let me find the default value.

Found the root cause. isMultilingual defaults to false in ImportFileParams and the job processor never sets it. Let me also check what searchIndex does with the Lucene query to confirm the _dotraw case issue.

Now I have the complete picture. There are actually two separate bugs working together.


Bug 1: isMultilingual is never set in the job processor

[ImportContentletsProcessor.java:427-444](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280) builds the params but never calls .isMultilingual(true), so it always defaults to false ([ImportFileParams.java:62](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280)):

final ImmutableImportFileParams importFileParams = ImmutableImportFileParams.builder()
        // ...
        .languageCodeHeaderColumn(headerInfo.languageCodeColumn)  // correctly set to 0
        .countryCodeHeaderColumn(headerInfo.countryCodeColumn)    // correctly set to 1
        // .isMultilingual(true)  ← NEVER CALLED
        .build();

With isMultilingual = false, languageCode/countryCode are not treated as special headers. They show as invalid headers but the import continues. The language still resolves correctly per row via processLanguage because languageCodeHeaderColumn IS set. So this bug is bad but not the killer blow on its own.


Bug 2: _dotraw query uses original case, ES stores lowercase — guaranteed miss

In [ESMappingAPIImpl.java:459-460](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280), every string value is lowercased before indexing:

if (lowerCaseValue instanceof String) {
    lowerCaseValue = ((String) lowerCaseValue).toLowerCase();  // "Testing2" → "testing2"
}
mapLowered.put(lowerCaseKey + DOTRAW, lowerCaseValue);  // stored as "testing2"

But [ImportUtil.java:2987](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280) queries with the raw CSV value:

.append(escapeLuceneSpecialCharacter(processedValue))  // "Testing2" — original case

Since _dotraw is a keyword field in ES, matches are case-sensitive. Testing2 ≠ testing2 — the search always returns empty for any mixed-case value. This is why no key-field match ever succeeds: not for Title, not for describer, not for orderPosition. Any value with an uppercase letter is a guaranteed miss.


The combined effect

With both bugs present: isMultilingual stays false, the language-constrained initial search returns empty (correct — the contentlet is in English, not French), the fallback runs but the _dotraw search also returns empty (due to case mismatch), so contentlets is always empty, and a brand new contentlet is always created. No identifier = always new. Your read of the situation is correct.


Fixes

Fix 1[ImportContentletsProcessor.java](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280) line ~433, add:

.isMultilingual(language == null)

Fix 2[ImportUtil.java:2987](https://claude.ai/epitaxy/local_449e95b7-7a3b-47a2-9611-b21e1b8a7280), lowercase the value before appending to the query:

String escapedValue = escapeLuceneSpecialCharacter(processedValue.toLowerCase());
.append(escapedValue.contains(" ") ? "\"" + escapedValue + "\"" : escapedValue)

Fix 2 is the root cause. Fix 1 is also needed for correctness (language headers not being recognized means multilingual validation warnings appear and the header logic is confused), but even with Fix 1 alone, the _dotraw case mismatch would still cause the search to return nothing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions