Skip to content

MB-65018 add custom_filter/custom_score query nodes with context-driven callback hooks#2289

Open
maneuvertomars wants to merge 10 commits intomasterfrom
MB-65018
Open

MB-65018 add custom_filter/custom_score query nodes with context-driven callback hooks#2289
maneuvertomars wants to merge 10 commits intomasterfrom
MB-65018

Conversation

@maneuvertomars
Copy link
Copy Markdown
Member

This PR adds two new query node types in Bleve: custom_filter and custom_score, both wrapping an inner query.
Execution is embedding-driven via request-scoped context hooks (CustomFilterFactory / CustomScoreFactory) so Bleve remains engine-agnostic.
custom_filter applies per-hit keep/drop logic through FilteringSearcher; custom_score applies per-hit score mutation through ScoreMutatingSearcher.

@coveralls
Copy link
Copy Markdown

coveralls commented Feb 22, 2026

Coverage Status

coverage: 52.405% (-0.3%) from 52.691%
when pulling de161f1 on MB-65018
into f7b13b9 on master.

Comment on lines +49 to +62
childSearcher, err := q.QueryVal.Searcher(ctx, i, m, options)
if err != nil {
return nil, err
}

// Resolve the request-scoped callback builder injected by the embedder.
factory, _ := ctx.Value(CustomFilterContextKey).(CustomFilterFactory)
if factory == nil {
return nil, fmt.Errorf("no custom filter factory registered in context")
}

// Build the per-hit filter callback from query-provided source/params/fields.
filterFunc, err := factory(q.Source, q.Params, q.Fields)
if err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that passing a callback function in the context object is very flaky, and difficult for library users to add directly to their application.

Also if no FilterHook is defined we should not error out i feel and just do unfiltered search.

Why not try this?

func (q *CustomFilterQuery) Searcher(ctx context.Context, i index.IndexReader, m mapping.IndexMapping, options search.SearcherOptions) (search.Searcher, error) {
	// Build the inner searcher first; custom filtering wraps its output.
	childSearcher, err := q.QueryVal.Searcher(ctx, i, m, options)
	if err != nil {
		return nil, err
	}
    // Check if the factor hook is registered
    if FilterHook == nil {
        // No filtering hook registered => no filtering to be done
        // Just return the child searcher itself
        return childSearcher, nil
    }
    filterFunc, err := filterFactory(q.Source, q.Params, q.Fields)
	if err != nil {
		return nil, err
	}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should fail fast here rather than silently bypass custom_filter/custom_score. If a query explicitly contains a custom node and no hook is registered, returning the child searcher would change query semantics and produce incorrect results. The context-carried hook is intentional so the execution runtime stays request-scoped and embedder-defined, while Bleve remains generic.

Copy link
Copy Markdown
Member Author

@maneuvertomars maneuvertomars Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fair point on standalone Bleve use case. I’ve addressed that by adding helper APIs (WithCustomFilterFactory(...), WithCustomScoreFactory(...), WithCustomFactories(...)) so users don’t have to use raw context keys directly.

ctx := query.WithCustomFilterFactory(context.Background(), filterFactory)
res, err := index.SearchInContext(ctx, req)


type CustomFilterQuery struct {
// QueryVal is the child query whose candidate matches are filtered.
QueryVal Query `json:"query"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query as the param name perhaps?

}
if q.Source == "" {
return fmt.Errorf("custom filter query must have source")
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a nil Fields param value valid?

Copy link
Copy Markdown
Member Author

@maneuvertomars maneuvertomars Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nil/empty Fields is valid by design. It means no stored fields are loaded into doc.fields, I clarified that in the field comments.

// Params carries caller-provided values passed as the second UDF argument.
Params map[string]interface{} `json:"params,omitempty"`
// Source carries embedding-defined callback source that travels with the query.
Source string `json:"source,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a nil or empty source is invalid we should not have omitempty

}

func (q *CustomFilterQuery) MarshalJSON() ([]byte, error) {
inner := map[string]interface{}{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to allow for easier extensibility of the struct, we can have a local copy of the struct here right?

type inner struct {
	Query  Query                  `json:"query"`
	Fields []string               `json:"fields,omitempty"`
	Params map[string]interface{} `json:"params,omitempty"`
	Source string                 `json:"source"`
}

return util.MarshalJSON(struct {
	CustomFilter inner `json:"custom_filter"`
}{
	CustomFilter: inner{
		Query:  q.QueryVal,
		Fields: q.Fields,
		Params: q.Params,
		Source: q.Source,
	},
})

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I switched MarshalJSON() to use a typed local wrapper so the JSON shape stays explicit and easier to extend.


// CustomScoreFactory lets the embedding application provide request-scoped
// score callbacks created from query-provided source/params/fields.
type CustomScoreFactory func(source string, params map[string]interface{}, fields []string) (searcher.ScoreFunc, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having these types and having library users pass in a function in context, consider this.

var (
  filterFactory CustomFilterFactory
  scoreFactory  CustomScoreFactory
)

func SetFilterFactory(f CustomFilterFactory) {
    filterFactory = f
}

func SetScoreFactory(f CustomScoreFactory) {
    scoreFactory = f
}

// usage from a bleve API POV 

```go
bleve.SetFilterFactory(f)
bleve.SetFilterFactory(f)
// can avoid context overhead entirely

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make the hook registration package-global in Bleve, whereas the current design keeps the execution hook request-scoped and explicit at search time. I’d prefer avoiding Bleve-level global mutable state here; the context handoff is small, aligns with existing search-time context usage, and keeps the embedder control per request.

}

// Resolve the request-scoped callback builder injected by the embedder.
factory, _ := ctx.Value(CustomScoreContextKey).(CustomScoreFactory)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, same as before - can avoid context ops by a simple global var init.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning here. I’d prefer keeping this request-scoped rather than introducing Bleve-level global mutable hooks. The context lookup cost is negligible relative to search/execution, and the current design keeps the embedder control explicit per request.

@abhinavdangeti abhinavdangeti added this to the v2.6.0 milestone Feb 24, 2026
// Params carries caller-provided values passed as the second UDF argument.
Params map[string]interface{} `json:"params,omitempty"`
// Source carries embedding-defined callback source that travels with the query.
Source string `json:"source"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move all this in-line commentary to before declaring the struct (like we do with the rest of the query constructs).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure Abhi, it makes sense.

// Params carries caller-provided values passed as the second UDF argument.
Params map[string]interface{} `json:"params,omitempty"`
// Source carries embedding-defined callback source that travels with the query.
Source string `json:"source"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@abhinavdangeti
Copy link
Copy Markdown
Member

@maneuvertomars Let's make sure you scrub the proposed code once again for any nil handling, especially around pointer dereferencing.

Copy link
Copy Markdown
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maneuvertomars Let's do a quick walk through tomorrow.

search_test.go Outdated
req = NewSearchRequest(q)
req.Explain = true
req.Fields = []string{"title"}
req.Fields = []string{"*"}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changes to this test case, seems unrelated - please undo.

// Context keys used by CustomFilterQuery/CustomScoreQuery to retrieve
// request-scoped hooks from the embedding application (e.g. cbft).
type customFilterKey string
type customScoreKey string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need two different typecasts for string. Given these aren't exported and you're not really using them anywere - just make the constants below of type strings.

abhinavdangeti
abhinavdangeti previously approved these changes Mar 27, 2026
Copy link
Copy Markdown
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait for review from @CascadingRadium as well.

}

// Resolve the request-scoped callback builder injected by the embedder.
factory := customFilterFactoryFromContext(ctx)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly extract the callback from the function, this is unnecessary.

)

const (
CustomFilterContextKey string = "custom_filter"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you can let these be untyped strings.


// CustomFilterFactory lets the embedding application provide request-scoped
// filter callbacks created from query-provided source/params/fields.
type CustomFilterFactory func(source string, params map[string]interface{}, fields []string) (searcher.FilterFunc, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a factory wrapper around your searcher.FilterFunc? For a user using bleve only, why would they need source, params and fields?

Copy link
Copy Markdown
Member Author

@maneuvertomars maneuvertomars Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The factory is useful for embedders(cbft in our case) that want to derive a per-hit callback once from query-provided source/params/fields during Searcher() construction. To keep standalone Bleve usage simple, I added direct helpers as well: WithCustomFilterFunc, WithCustomScoreFunc, and WithCustomFuncs, so a Bleve-only user can register plain per-hit callbacks without needing to care about the builder path.
For example:

ctx := bleveQuery.WithCustomFuncs(context.Background(),
	func(_ *search.SearchContext, d *search.DocumentMatch) bool {
		return d.Score > 0
	},
	func(_ *search.SearchContext, d *search.DocumentMatch) float64 {
		return d.Score + 1
	},
)

res, err := index.SearchInContext(ctx, req)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, CustomFilterFactory and CustomScoreFactory are still cbft specific. You can instead situate these types in cbft only, and call the factory there giving you FilterFunc or ScoreFunc. You can then just pass these down as part of the context to bleve.

Basically right now, you're doing:
query comes in -> get the factory callback, put in the query's context -> forward query to bleve -> bleve executes the factory with the query source, params and fields to get the filter/score func -> use the filter/score func against each hit

Instead, why not do this:
query comes in -> get the factory callback, execute it with the params required (on cbft) -> get the filter/scorer func, put in the query's context -> forward to bleve -> bleve will execute the filter/scorer.

This will make bleve completely oblivious to cbft specific functionality.


// CustomScoreFactory lets the embedding application provide request-scoped
// score callbacks created from query-provided source/params/fields.
type CustomScoreFactory func(source string, params map[string]interface{}, fields []string) (searcher.ScoreFunc, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above.


// WithCustomFactories returns a context carrying both request-scoped
// custom query factories for search execution.
func WithCustomFactories(ctx context.Context, filterFactory CustomFilterFactory,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factory methods are not needed IMO, users can just directly register the callbacks in the context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can register directly in context, but I’m keeping the helpers because they make the supported contract explicit and avoid repeating the context-key plumbing at each call site.
Also, standalone Bleve users now have a direct callback path via WithCustomFilterFunc / WithCustomScoreFunc / WithCustomFuncs, while embedders that need one-time callback construction from source / params / fields can still use the factory-based helpers.

}

// ScoreFunc defines a function which can mutate document scores
type ScoreFunc func(sctx *search.SearchContext, d *search.DocumentMatch) float64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScoreFunc is a callback that's created outside the index, but SearchContext is an internal variable that has access to the index reader, collector and document match pool, all of which can potentially be misused. I don't think access to this struct should be given.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like FilterFunc which was implemented as part of boolean queries follows this signature as well, but that was used internally only. Since these callbacks are registered outside bleve I still think this might be a bad idea. @CascadingRadium what do you think?

@capemox
Copy link
Copy Markdown
Member

capemox commented Mar 30, 2026

@maneuvertomars please add functions to create new custom_filter/custom_score queries in query.go in bleve's root, and also add end to end tests in search_test.go to validate it works with some non-trivial custom filters and custom scorers.

Comment on lines +35 to +37
Fields []string `json:"fields,omitempty"`
Params map[string]interface{} `json:"params,omitempty"`
Source string `json:"source"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a user using only bleve, these additional query args don't make much sense and are cbft specific

Comment on lines +35 to +37
Fields []string `json:"fields,omitempty"`
Params map[string]interface{} `json:"params,omitempty"`
Source string `json:"source"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

`CustomFilterQuery` contains:

- a child query
- a bound `searcher.FilterFunc`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does bound here and below supposed to indicate?


```go
filterQuery := bleve.NewCustomFilterQuery(childQuery)
scoreQuery := bleve.NewCustomScoreQuery(childQuery)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like here and elsewhere you need to update with how a user is meant to pass in filters and payloads.

}

func (q *CustomFilterQuery) UnmarshalJSON(data []byte) error {
child, payload, err := unmarshalCustomQueryPayload(data, "custom_filter")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bound to fail if the payload does not comply to map[string]map[string]json.RawMessage - which is going to be pretty bad UX.

}

func (q *CustomScoreQuery) UnmarshalJSON(data []byte) error {
child, payload, err := unmarshalCustomQueryPayload(data, "custom_score")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

return nil, nil, err
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused at what we're doing here - why this query attribute within the payload, if there's one outside already?

}

var CustomFilterQueryParser func([]byte) (Query, error)
var CustomScoreQueryParser func([]byte) (Query, error)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare each of these alongside their respective query types - and add sufficient commentary around how these can be updated only during init and with examples of how bleve users can use these.


// ScoreMutatingSearcher wraps any other searcher, but mutates the score
// of any Next/Advance call using the supplied ScoreFunc
type ScoreMutatingSearcher struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor to CustomScoreSearcher.

// child query and binding the provided per-hit score callback.
func NewCustomScoreQueryWithScorer(child query.Query, score searcher.ScoreFunc) *query.CustomScoreQuery {
return query.NewCustomScoreQueryWithScorer(child, score)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop all these convenience redirectors for now.

}
}

func NewCustomFilterQueryWithFilterAndPayload(query Query, filter searcher.FilterFunc, payload map[string]interface{}) *CustomFilterQuery {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this, and make NewCustomFilterQueryWithFilter handle the payload as well - you'll want to accommodate nil payload situations.

}
}

func NewCustomScoreQueryWithScorerAndPayload(query Query, score searcher.ScoreFunc, payload map[string]interface{}) *CustomScoreQuery {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this, and make NewCustomScoreQueryWithScorer handle the payload as well - you'll want to accommodate nil payload situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

5 participants