Skip to content

Site search: normal HTML pages use CMS Page Title instead of rendered <title>; request backward-compatible option #35153

@syedATdot

Description

@syedATdot

Problem Statement

For non–URL-mapped HTML pages, site search currently overwrites the title parsed from rendered HTML with page.getTitle() (page properties). Legacy site search (StaticHTMLPageBundler / older ESSiteSearchPublisher) indexed Tika metadata from the rendered file, so the indexed title effectively followed the HTML <title> (as produced by template/theme/SEO, e.g. dotSeo). URL-mapped pages follow a different path where metadata title is not replaced the same way.

Customer impact: Partners/clients built thousands of pages assuming site search matched rendered <title>, which often differs from the CMS Title field (theme/SEO). They cannot migrate all page titles. This blocks expected search result titles.

Customer request: Raise with product; add configuration (default false) to optionally populate site search title from parsed <title> / HTML metadata for backward compatibility.

Steps to Reproduce

  • Create or use an HTML page that is not URL-mapped.
  • Set Page Title (page properties) to a distinct value, e.g. CMS_TITLE_FOR_SEARCH_TEST.
  • Ensure the published page’s HTML has a different <title> in (e.g. via theme / html_head.vtl / dotSeo / hardcoded in template) — e.g. HTML_TITLE_IN_HEAD_ONLY.
  • Publish the page and confirm in View Source on the live URL that <title>HTML_TITLE_IN_HEAD_ONLY</title> while the CMS Title remains CMS_TITLE_FOR_SEARCH_TEST.
  • Run Site Search indexing (full or incremental as appropriate) for that site/index.
  • Search in Site Search for content unique to that page.
  • Actual result: The hit title in site search is CMS_TITLE_FOR_SEARCH_TEST (page properties), not HTML_TITLE_IN_HEAD_ONLY.

Acceptance Criteria

  • Today (documented as current product behaviour): Site search uses page properties (page.getTitle()) for the indexed title on normal HTML pages, so the steps above currently yield the CMS Title in results.

  • Desired / backward-compatible behaviour (customer & partner ask):

  • Default: Keep current behaviour so existing installs do not change unexpectedly.

  • With opt-in config (e.g. false by default): For normal HTML pages, the site search title should match the rendered HTML <title> when present (same source as legacy: metadata/Tika on the bundled HTML file), falling back to page.getTitle() only when that parsed title is missing or empty.

dotCMS Version

latest

Severity

High - Major functionality broken

Links

https://helpdesk.dotcms.com/a/tickets/36008

Metadata

Metadata

Assignees

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions