MB-65018 add custom_filter/custom_score query nodes with context-driven callback hooks#2289
MB-65018 add custom_filter/custom_score query nodes with context-driven callback hooks#2289maneuvertomars wants to merge 10 commits intomasterfrom
Conversation
…en callback hooks
search/query/custom_filter.go
Outdated
| childSearcher, err := q.QueryVal.Searcher(ctx, i, m, options) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // Resolve the request-scoped callback builder injected by the embedder. | ||
| factory, _ := ctx.Value(CustomFilterContextKey).(CustomFilterFactory) | ||
| if factory == nil { | ||
| return nil, fmt.Errorf("no custom filter factory registered in context") | ||
| } | ||
|
|
||
| // Build the per-hit filter callback from query-provided source/params/fields. | ||
| filterFunc, err := factory(q.Source, q.Params, q.Fields) | ||
| if err != nil { |
There was a problem hiding this comment.
I would argue that passing a callback function in the context object is very flaky, and difficult for library users to add directly to their application.
Also if no FilterHook is defined we should not error out i feel and just do unfiltered search.
Why not try this?
func (q *CustomFilterQuery) Searcher(ctx context.Context, i index.IndexReader, m mapping.IndexMapping, options search.SearcherOptions) (search.Searcher, error) {
// Build the inner searcher first; custom filtering wraps its output.
childSearcher, err := q.QueryVal.Searcher(ctx, i, m, options)
if err != nil {
return nil, err
}
// Check if the factor hook is registered
if FilterHook == nil {
// No filtering hook registered => no filtering to be done
// Just return the child searcher itself
return childSearcher, nil
}
filterFunc, err := filterFactory(q.Source, q.Params, q.Fields)
if err != nil {
return nil, err
}
}There was a problem hiding this comment.
I think we should fail fast here rather than silently bypass custom_filter/custom_score. If a query explicitly contains a custom node and no hook is registered, returning the child searcher would change query semantics and produce incorrect results. The context-carried hook is intentional so the execution runtime stays request-scoped and embedder-defined, while Bleve remains generic.
There was a problem hiding this comment.
Also fair point on standalone Bleve use case. I’ve addressed that by adding helper APIs (WithCustomFilterFactory(...), WithCustomScoreFactory(...), WithCustomFactories(...)) so users don’t have to use raw context keys directly.
ctx := query.WithCustomFilterFactory(context.Background(), filterFactory)
res, err := index.SearchInContext(ctx, req)
search/query/custom_filter.go
Outdated
|
|
||
| type CustomFilterQuery struct { | ||
| // QueryVal is the child query whose candidate matches are filtered. | ||
| QueryVal Query `json:"query"` |
There was a problem hiding this comment.
Query as the param name perhaps?
search/query/custom_filter.go
Outdated
| } | ||
| if q.Source == "" { | ||
| return fmt.Errorf("custom filter query must have source") | ||
| } |
There was a problem hiding this comment.
Is a nil Fields param value valid?
There was a problem hiding this comment.
Yes, nil/empty Fields is valid by design. It means no stored fields are loaded into doc.fields, I clarified that in the field comments.
search/query/custom_filter.go
Outdated
| // Params carries caller-provided values passed as the second UDF argument. | ||
| Params map[string]interface{} `json:"params,omitempty"` | ||
| // Source carries embedding-defined callback source that travels with the query. | ||
| Source string `json:"source,omitempty"` |
There was a problem hiding this comment.
Since a nil or empty source is invalid we should not have omitempty
search/query/custom_filter.go
Outdated
| } | ||
|
|
||
| func (q *CustomFilterQuery) MarshalJSON() ([]byte, error) { | ||
| inner := map[string]interface{}{ |
There was a problem hiding this comment.
to allow for easier extensibility of the struct, we can have a local copy of the struct here right?
type inner struct {
Query Query `json:"query"`
Fields []string `json:"fields,omitempty"`
Params map[string]interface{} `json:"params,omitempty"`
Source string `json:"source"`
}
return util.MarshalJSON(struct {
CustomFilter inner `json:"custom_filter"`
}{
CustomFilter: inner{
Query: q.QueryVal,
Fields: q.Fields,
Params: q.Params,
Source: q.Source,
},
})There was a problem hiding this comment.
Makes sense. I switched MarshalJSON() to use a typed local wrapper so the JSON shape stays explicit and easier to extend.
search/query/custom_hooks.go
Outdated
|
|
||
| // CustomScoreFactory lets the embedding application provide request-scoped | ||
| // score callbacks created from query-provided source/params/fields. | ||
| type CustomScoreFactory func(source string, params map[string]interface{}, fields []string) (searcher.ScoreFunc, error) |
There was a problem hiding this comment.
Instead of having these types and having library users pass in a function in context, consider this.
var (
filterFactory CustomFilterFactory
scoreFactory CustomScoreFactory
)
func SetFilterFactory(f CustomFilterFactory) {
filterFactory = f
}
func SetScoreFactory(f CustomScoreFactory) {
scoreFactory = f
}
// usage from a bleve API POV
```go
bleve.SetFilterFactory(f)
bleve.SetFilterFactory(f)
// can avoid context overhead entirelyThere was a problem hiding this comment.
That would make the hook registration package-global in Bleve, whereas the current design keeps the execution hook request-scoped and explicit at search time. I’d prefer avoiding Bleve-level global mutable state here; the context handoff is small, aligns with existing search-time context usage, and keeps the embedder control per request.
search/query/custom_score.go
Outdated
| } | ||
|
|
||
| // Resolve the request-scoped callback builder injected by the embedder. | ||
| factory, _ := ctx.Value(CustomScoreContextKey).(CustomScoreFactory) |
There was a problem hiding this comment.
ditto, same as before - can avoid context ops by a simple global var init.
There was a problem hiding this comment.
Same reasoning here. I’d prefer keeping this request-scoped rather than introducing Bleve-level global mutable hooks. The context lookup cost is negligible relative to search/execution, and the current design keeps the embedder control explicit per request.
search/query/custom_filter.go
Outdated
| // Params carries caller-provided values passed as the second UDF argument. | ||
| Params map[string]interface{} `json:"params,omitempty"` | ||
| // Source carries embedding-defined callback source that travels with the query. | ||
| Source string `json:"source"` |
There was a problem hiding this comment.
Let's move all this in-line commentary to before declaring the struct (like we do with the rest of the query constructs).
There was a problem hiding this comment.
Sure Abhi, it makes sense.
search/query/custom_score.go
Outdated
| // Params carries caller-provided values passed as the second UDF argument. | ||
| Params map[string]interface{} `json:"params,omitempty"` | ||
| // Source carries embedding-defined callback source that travels with the query. | ||
| Source string `json:"source"` |
|
@maneuvertomars Let's make sure you scrub the proposed code once again for any nil handling, especially around pointer dereferencing. |
abhinavdangeti
left a comment
There was a problem hiding this comment.
@maneuvertomars Let's do a quick walk through tomorrow.
search_test.go
Outdated
| req = NewSearchRequest(q) | ||
| req.Explain = true | ||
| req.Fields = []string{"title"} | ||
| req.Fields = []string{"*"} |
There was a problem hiding this comment.
Why changes to this test case, seems unrelated - please undo.
search/query/custom_hooks.go
Outdated
| // Context keys used by CustomFilterQuery/CustomScoreQuery to retrieve | ||
| // request-scoped hooks from the embedding application (e.g. cbft). | ||
| type customFilterKey string | ||
| type customScoreKey string |
There was a problem hiding this comment.
You don't need two different typecasts for string. Given these aren't exported and you're not really using them anywere - just make the constants below of type strings.
abhinavdangeti
left a comment
There was a problem hiding this comment.
Please wait for review from @CascadingRadium as well.
search/query/custom_filter.go
Outdated
| } | ||
|
|
||
| // Resolve the request-scoped callback builder injected by the embedder. | ||
| factory := customFilterFactoryFromContext(ctx) |
There was a problem hiding this comment.
Explicitly extract the callback from the function, this is unnecessary.
search/query/custom_hooks.go
Outdated
| ) | ||
|
|
||
| const ( | ||
| CustomFilterContextKey string = "custom_filter" |
There was a problem hiding this comment.
Nit: you can let these be untyped strings.
search/query/custom_hooks.go
Outdated
|
|
||
| // CustomFilterFactory lets the embedding application provide request-scoped | ||
| // filter callbacks created from query-provided source/params/fields. | ||
| type CustomFilterFactory func(source string, params map[string]interface{}, fields []string) (searcher.FilterFunc, error) |
There was a problem hiding this comment.
Why do you need a factory wrapper around your searcher.FilterFunc? For a user using bleve only, why would they need source, params and fields?
There was a problem hiding this comment.
The factory is useful for embedders(cbft in our case) that want to derive a per-hit callback once from query-provided source/params/fields during Searcher() construction. To keep standalone Bleve usage simple, I added direct helpers as well: WithCustomFilterFunc, WithCustomScoreFunc, and WithCustomFuncs, so a Bleve-only user can register plain per-hit callbacks without needing to care about the builder path.
For example:
ctx := bleveQuery.WithCustomFuncs(context.Background(),
func(_ *search.SearchContext, d *search.DocumentMatch) bool {
return d.Score > 0
},
func(_ *search.SearchContext, d *search.DocumentMatch) float64 {
return d.Score + 1
},
)
res, err := index.SearchInContext(ctx, req)There was a problem hiding this comment.
Hmm, CustomFilterFactory and CustomScoreFactory are still cbft specific. You can instead situate these types in cbft only, and call the factory there giving you FilterFunc or ScoreFunc. You can then just pass these down as part of the context to bleve.
Basically right now, you're doing:
query comes in -> get the factory callback, put in the query's context -> forward query to bleve -> bleve executes the factory with the query source, params and fields to get the filter/score func -> use the filter/score func against each hit
Instead, why not do this:
query comes in -> get the factory callback, execute it with the params required (on cbft) -> get the filter/scorer func, put in the query's context -> forward to bleve -> bleve will execute the filter/scorer.
This will make bleve completely oblivious to cbft specific functionality.
search/query/custom_hooks.go
Outdated
|
|
||
| // CustomScoreFactory lets the embedding application provide request-scoped | ||
| // score callbacks created from query-provided source/params/fields. | ||
| type CustomScoreFactory func(source string, params map[string]interface{}, fields []string) (searcher.ScoreFunc, error) |
There was a problem hiding this comment.
Same reasoning as above.
search/query/custom_hooks.go
Outdated
|
|
||
| // WithCustomFactories returns a context carrying both request-scoped | ||
| // custom query factories for search execution. | ||
| func WithCustomFactories(ctx context.Context, filterFactory CustomFilterFactory, |
There was a problem hiding this comment.
Factory methods are not needed IMO, users can just directly register the callbacks in the context.
There was a problem hiding this comment.
Users can register directly in context, but I’m keeping the helpers because they make the supported contract explicit and avoid repeating the context-key plumbing at each call site.
Also, standalone Bleve users now have a direct callback path via WithCustomFilterFunc / WithCustomScoreFunc / WithCustomFuncs, while embedders that need one-time callback construction from source / params / fields can still use the factory-based helpers.
| } | ||
|
|
||
| // ScoreFunc defines a function which can mutate document scores | ||
| type ScoreFunc func(sctx *search.SearchContext, d *search.DocumentMatch) float64 |
There was a problem hiding this comment.
ScoreFunc is a callback that's created outside the index, but SearchContext is an internal variable that has access to the index reader, collector and document match pool, all of which can potentially be misused. I don't think access to this struct should be given.
There was a problem hiding this comment.
Looks like FilterFunc which was implemented as part of boolean queries follows this signature as well, but that was used internally only. Since these callbacks are registered outside bleve I still think this might be a bad idea. @CascadingRadium what do you think?
|
@maneuvertomars please add functions to create new custom_filter/custom_score queries in |
search/query/custom_filter.go
Outdated
| Fields []string `json:"fields,omitempty"` | ||
| Params map[string]interface{} `json:"params,omitempty"` | ||
| Source string `json:"source"` |
There was a problem hiding this comment.
For a user using only bleve, these additional query args don't make much sense and are cbft specific
search/query/custom_score.go
Outdated
| Fields []string `json:"fields,omitempty"` | ||
| Params map[string]interface{} `json:"params,omitempty"` | ||
| Source string `json:"source"` |
| `CustomFilterQuery` contains: | ||
|
|
||
| - a child query | ||
| - a bound `searcher.FilterFunc` |
There was a problem hiding this comment.
What does bound here and below supposed to indicate?
|
|
||
| ```go | ||
| filterQuery := bleve.NewCustomFilterQuery(childQuery) | ||
| scoreQuery := bleve.NewCustomScoreQuery(childQuery) |
There was a problem hiding this comment.
Looks like here and elsewhere you need to update with how a user is meant to pass in filters and payloads.
| } | ||
|
|
||
| func (q *CustomFilterQuery) UnmarshalJSON(data []byte) error { | ||
| child, payload, err := unmarshalCustomQueryPayload(data, "custom_filter") |
There was a problem hiding this comment.
This is bound to fail if the payload does not comply to map[string]map[string]json.RawMessage - which is going to be pretty bad UX.
| } | ||
|
|
||
| func (q *CustomScoreQuery) UnmarshalJSON(data []byte) error { | ||
| child, payload, err := unmarshalCustomQueryPayload(data, "custom_score") |
| return nil, nil, err | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm confused at what we're doing here - why this query attribute within the payload, if there's one outside already?
| } | ||
|
|
||
| var CustomFilterQueryParser func([]byte) (Query, error) | ||
| var CustomScoreQueryParser func([]byte) (Query, error) |
There was a problem hiding this comment.
Declare each of these alongside their respective query types - and add sufficient commentary around how these can be updated only during init and with examples of how bleve users can use these.
|
|
||
| // ScoreMutatingSearcher wraps any other searcher, but mutates the score | ||
| // of any Next/Advance call using the supplied ScoreFunc | ||
| type ScoreMutatingSearcher struct { |
There was a problem hiding this comment.
Refactor to CustomScoreSearcher.
| // child query and binding the provided per-hit score callback. | ||
| func NewCustomScoreQueryWithScorer(child query.Query, score searcher.ScoreFunc) *query.CustomScoreQuery { | ||
| return query.NewCustomScoreQueryWithScorer(child, score) | ||
| } |
There was a problem hiding this comment.
Let's drop all these convenience redirectors for now.
| } | ||
| } | ||
|
|
||
| func NewCustomFilterQueryWithFilterAndPayload(query Query, filter searcher.FilterFunc, payload map[string]interface{}) *CustomFilterQuery { |
There was a problem hiding this comment.
Drop this, and make NewCustomFilterQueryWithFilter handle the payload as well - you'll want to accommodate nil payload situations.
| } | ||
| } | ||
|
|
||
| func NewCustomScoreQueryWithScorerAndPayload(query Query, score searcher.ScoreFunc, payload map[string]interface{}) *CustomScoreQuery { |
There was a problem hiding this comment.
Drop this, and make NewCustomScoreQueryWithScorer handle the payload as well - you'll want to accommodate nil payload situations.
This PR adds two new query node types in Bleve: custom_filter and custom_score, both wrapping an inner query.
Execution is embedding-driven via request-scoped context hooks (CustomFilterFactory / CustomScoreFactory) so Bleve remains engine-agnostic.
custom_filter applies per-hit keep/drop logic through FilteringSearcher; custom_score applies per-hit score mutation through ScoreMutatingSearcher.