Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #59845

#59845)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #59394

Problem Summary:

This PR adds `fields` and `type` parameters to the SEARCH function,
allowing queries to search across multiple fields with a single query
term. This is similar to Elasticsearch's multi_match query with
`best_fields` and `cross_fields` types.

#### Multi-Field Search Support

```sql
-- Single term across multiple fields (best_fields mode - default)
SELECT * FROM docs WHERE search('hello', '{"fields":["title","content"]}');
-- Equivalent to: (title:hello) OR (content:hello)

-- Multi-term with AND operator (best_fields mode - default)
SELECT * FROM docs WHERE search('hello world', 
  '{"fields":["title","content"],"default_operator":"and"}');
-- Equivalent to: (title:hello AND title:world) OR (content:hello AND content:world)

-- Multi-term with cross_fields mode
SELECT * FROM docs WHERE search('hello world', 
  '{"fields":["title","content"],"default_operator":"and","type":"cross_fields"}');
-- Equivalent to: (title:hello OR content:hello) AND (title:world OR content:world)

-- Combined with Lucene mode
SELECT * FROM docs WHERE search('machine AND learning', 
  '{"fields":["title","content"],"mode":"lucene","minimum_should_match":0}');
```

#### Type Parameter Options

| Type | Description | Behavior |
|------|-------------|----------|
| `best_fields` (default) | All terms must match within the **SAME**
field | `"hello world"` → `(title:hello AND title:world) OR
(content:hello AND content:world)` |
| `cross_fields` | Terms can match across **DIFFERENT** fields | `"hello
world"` → `(title:hello OR content:hello) AND (title:world OR
content:world)` |

**Key features:**
- `type` parameter controls how terms are matched across fields
- `best_fields` (default): Finds documents where all terms appear in the
same field - ideal for relevance ranking
- `cross_fields`: Treats multiple fields as one big field - ideal for
name searches across first_name/last_name
- Compatible with both standard mode and Lucene boolean mode
- `fields` and `default_field` are mutually exclusive
- Supports functions (EXACT, ANY, ALL) across fields
- Supports wildcard queries across fields

**Behavior examples:**

| Query | Fields | Type | Expanded DSL |
|-------|--------|------|--------------|
| `hello` | `["title","content"]` | best_fields | `(title:hello) OR
(content:hello)` |
| `hello world` (AND) | `["title","content"]` | best_fields |
`(title:hello AND title:world) OR (content:hello AND content:world)` |
| `hello world` (AND) | `["title","content"]` | cross_fields |
`(title:hello OR content:hello) AND (title:world OR content:world)` |
| `EXACT(foo bar)` | `["title","content"]` | any | `(title:EXACT(foo
bar) OR content:EXACT(foo bar))` |
| `hello AND category:tech` | `["title","content"]` | any |
`(title:hello OR content:hello) AND category:tech` |

**Use case examples:**

- **Product search**: Use `best_fields` when searching product name and
description - prefer products where query terms appear together
- **Person name search**: Use `cross_fields` when searching first_name
and last_name - "John Smith" should match documents with
`first_name:John` and `last_name:Smith`

### Release note

- Add multi-field search support for SEARCH function (`fields`
parameter)
- Add `type` parameter with `best_fields` (default) and `cross_fields`
modes
- `best_fields`: All terms must match within the same field (default,
matches Elasticsearch behavior)
- `cross_fields`: Terms can match across different fields
- Compatible with Lucene mode for MUST/SHOULD/MUST_NOT semantics
@github-actions github-actions bot requested a review from yiguolei as a code owner January 19, 2026 03:05
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Jan 19, 2026
@hello-stephen
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 60.48% (228/377) 🎉
Increment coverage report
Complete coverage report

@yiguolei yiguolei merged commit 3a55f70 into branch-4.0 Jan 19, 2026
25 of 28 checks passed
@github-actions github-actions bot deleted the auto-pick-59845-branch-4.0 branch January 19, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants