Skip to content

feat(es-compat): support regexp shorthand, expose concatenate fields, map text to keyword in _mapping#6208

Merged
congx4 merged 4 commits intoquickwit-oss:mainfrom
congx4:cong.xie/es-compat-regexp-shorthand
Mar 19, 2026
Merged

feat(es-compat): support regexp shorthand, expose concatenate fields, map text to keyword in _mapping#6208
congx4 merged 4 commits intoquickwit-oss:mainfrom
congx4:cong.xie/es-compat-regexp-shorthand

Conversation

@congx4
Copy link
Contributor

@congx4 congx4 commented Mar 18, 2026

Summary

This PR improves ES compatibility in three areas:

  • Support shorthand regexp query format — Elasticsearch accepts both {"regexp": {"field": "pattern"}} (shorthand) and {"regexp": {"field": {"value": "pattern"}}} (full). Quickwit only supported the full form, causing connectors like the Trino ES connector to fail with a deserialization error when pushing down LIKE predicates (which get translated to regexp queries).

  • Map Text fields to keyword in _mapping response — The Trino ES connector (and potentially other connectors) only pushes down LIKE and filter predicates for keyword-typed fields. By reporting Quickwit text fields as ES keyword in the mapping response, we enable filter pushdown for string fields.

  • Expose Concatenate fields in _mapping endpoint — Concatenate fields (e.g. an all field that combines multiple source fields) were previously hidden from the _mapping response. This change exposes them as keyword type so downstream connectors can discover and query them.

Test plan

  • Unit tests for regexp shorthand and full formats (3 new tests)
  • Unit tests updated for mapping type changes (text→keyword, concatenate→keyword)
  • Verified end-to-end with Trino ES connector: SELECT ... WHERE service LIKE '%logs%' now works

Made with Cursor

congx4 and others added 2 commits March 18, 2026 11:19
…fields, and map text to keyword in _mapping

Elasticsearch's `regexp` query accepts two formats:
- Shorthand: `{"regexp": {"field": "pattern"}}`
- Full: `{"regexp": {"field": {"value": "pattern", "case_insensitive": true}}}`

Quickwit only supported the full form, causing queries from ES-compatible
connectors (e.g. Trino ES connector) to fail with a deserialization error.
This adds support for the shorthand format via `#[serde(untagged)]` enum
deserialization.

Additionally, in the `_mapping` endpoint:
- `Text` fields are now reported as `keyword` type. This enables filter
  pushdown (e.g. `LIKE` predicates) from connectors that only push down
  filters for `keyword`-typed fields.
- `Concatenate` fields are now exposed as `keyword` type instead of being
  hidden. This allows connectors to discover and query these fields.

Made-with: Cursor
… lint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
/// - Full: `{"regexp": {"field": {"value": "pattern", "case_insensitive": true}}}`
#[derive(Deserialize, Debug, Eq, PartialEq, Clone)]
#[serde(untagged)]
enum RegexQueryParamsInner {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we implement this directly on RegexQueryParams?

congx4 and others added 2 commits March 19, 2026 13:46
…d keyword comment

- Replace inner enum + serde(from) with a custom Deserialize impl directly
  on RegexQueryParams, as suggested by reviewer
- Add comment explaining why text fields are mapped to keyword in the
  ES-compat _mapping response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the custom Deserialize visitor with a simple #[serde(untagged)]
enum that handles both shorthand and full regexp query formats directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@congx4 congx4 merged commit 06f0ef0 into quickwit-oss:main Mar 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants