Skip to content

Support filtering interface/union types on _typename#1169

Open
marcdaniels-toast wants to merge 1 commit into
block:mainfrom
marcdaniels-toast:mdaniels/typename-filter
Open

Support filtering interface/union types on _typename#1169
marcdaniels-toast wants to merge 1 commit into
block:mainfrom
marcdaniels-toast:mdaniels/typename-filter

Conversation

@marcdaniels-toast
Copy link
Copy Markdown
Contributor

@marcdaniels-toast marcdaniels-toast commented May 8, 2026

Closes #1024

Finishes the work @myronmarston started in #1027.

Adds a _typename filter field to all interface and union type filter inputs, allowing clients to filter by concrete subtype:

{ distribution_channels(filter: { _typename: { equal_to_any_of: ["OnlineStore"] } }) { ... } }

What this adds on top of #1027:

  • name_in_index: "__typename" on the filter field so FilterArgsTranslator maps _typename__typename in the datastore query at runtime
  • Regenerated schema artifacts with the name_in_index entry in runtime metadata for each abstract FilterInput type
  • Unit test verifying the _typename__typename translation in filters_spec.rb
  • Acceptance tests covering _typename filtering on: top-level abstract type queries (distribution_channels, retailers) where multiple concrete types share an index, embedded union/interface fields (inventor, named_inventor), and aggregations

This branch was rebased onto main after #1171 merged, which now allows __typename datastore filters to match single-type indexes (indexes that store only a single concrete type). So now the acceptance test covers these cases like equal_to_any_of: ["OnlineStore", "PhysicalStore"], for example.

I decided to defer the optimization idea to use the _typename filter to limit the set of search_index_definitions since that seems like a pure optimization that could be done separately to prevent bloating this PR with additional complexity not strictly required for working _typename filters (the issue described in #1024).

@myronmarston
Copy link
Copy Markdown
Collaborator

Known limitation: _typename only works reliably for types stored in a shared index (where the indexer injects __typename). Types with a dedicated index (e.g. PhysicalStore in physical_stores) don't have __typename stored, so equal_to_any_of: ["PhysicalStore"] returns nothing and not: { equal_to_any_of: [..., "PhysicalStore"] } fails to exclude them. This is inherent to how dedicated-index types work, not a bug introduced here. As a workaround, equal_to_any_of: [null] correctly matches dedicated-index documents, mirroring how AbstractTypeFilter handles them internally.

I think this known limitation is a problem. I think we either need to get _typename to work in these spots or find a way to omit _typename from the schema where it's not going to work correctly. Having it available in the GraphQL schema, but not work correctly in some cases, isn't satisfactory, IMO. After all--clients of the GraphQL API have no understanding of the index structure of the types, to understand when _typename will work and when it won't work.

I think there's a solution, though--and it's one that should support more optimal queries to boot: let's leverage the _typename filter to narrow down the set of indices being queried. In a case where a concrete subtype lives in its own index, there's no __typename to filter on but we can update the query.search_index_definitions so that we hit only the index that has that type.

That said, there's a situation that's tricky to solve. Imagine this:

# The DistributionChannel hierarchy has two branches:
#   DistributionChannel (index: distribution_channels)
#   ├── Wholesale            (interface, inherits distribution_channels)
#   │   ├── DirectWholesaler (concrete, distribution_channels index)
#   │   └── BrokerWholesaler (concrete, distribution_channels index)
#   └── Retail               (interface, inherits distribution_channels)
#       └── Store            (interface, inherits distribution_channels)
#           ├── OnlineStore  (concrete, distribution_channels index)
#           └── PhysicalStore (concrete, physical_stores index)

And imagine a query comes in like this:

distributionChannels(filter: {
 _typename: {equalToAnyOf: ["PhysicalStore", "BrokerWholesale"]}
}) {
  # ...
}

This query needs to hit both the physical_stores index and the distribution_channels index, and it needs to filter the distribution_channels index on __typename == "BrokerWholesale". But there's no way that I know of for a datastore query to have a "conditional" filter that applies to only one of the indexes. I think the solution is to introduce a __typename field on the physical_stores index as a constant_keyword (ES docs, OS docs). You could then filter on _typename IN ("PhysicalStore", "BrokerWholesale") against both indices and it should work, without requiring per-document storage space in the physical_stores index to store __typename: "PhysicalStore".

Now that I've thought of using constant_keyword for this case, it makes me wonder if it might make sense to always include a __typename property in every index for every object type, using constant_keyword if it's not an abstract type. We have some spots where we conditionally use __typename when available, and it could simplify to make it always available. We don't want to take up the storage costs of storing __typename per-document when not needed but constant_keyword solves that nicely!

Also, I just realized something while writing that up--a client might try this:

distributionChannels(filter: {
 _typename: {equalToAnyOf: ["Wholesale"]}
}) {
  # ...
}

A client could try that expecting it to return all documents that are subtypes of "Wholesale". But of course it won't work because there are no docs where __typename == "Wholesale". And _typename is meant to mimic the __typename return field clients can request. I see two potential solutions here:

  • Document this--in the generated GraphQL schema, document the semantics of _typename, so clients are aware of how it works.
  • Use an enum for _typename. Instead of it being a string, we could make it an enum, and in the enum the only members would be the concrete types (e.g. DirectWholesaler, BrokerWholesaler, OnlineStore, PhysicalStore, but not DistributionChannel, Retail, Store). Then the schema itself would make it impossible to filter on an abstract supertype.

I want to hear what you favor but to share my two cents: I like the 2nd option a lot for the type safety it provides, but I think I ultimately lean towards the first option, for a few reasons:

  • It's less effort! Just a documentation update.
  • Given that it's an equalToAnyOf filter I think it's natural for clients to expect __typename == value semantics, so this isn't that likely to occur in practice.
  • It mimics the __typename return field which is a String instead of an enum.
  • Having enum values like PhsyicalStore violates common GraphQL lint rules that expect enum values like PHYSICAL_STORE, such as in Apollo's linter. We could do the screaming case but then we have to map between them and it feels suboptimal.
  • If we're going to go the enum route we'd want it to be as typesafe as possible and instead of having a single ConcreteTypeName enum type containing all concrete types in the schema, we'd probably want to make it per-abstract type. But that's more to manage and would kinda bloat the schema.
  • There's potential for composition issues when such a type gets included into a supergraph schema.

Thoughts @marcdaniels-toast?

@marcdaniels-toast
Copy link
Copy Markdown
Contributor Author

marcdaniels-toast commented May 8, 2026

Thanks for pushing back on the limitation. It did seem a bit awkward at the time but I didn't know we had such a clean solution as constant_keyword. Thanks for pointing it out. I'm on board with using that for the single-type indexes. I'd like to make that change in at least one separate PR before updating this one to keep a PR focused on that change. [Edit: this has been done in #1171]

As for filtering by abstract types, what about query-time expansion? When the filter interpreter sees a _typename filter value that names an abstract type, it could expand it to the set of concrete subtypes. So ["Wholesale"] becomes ["DirectWholesaler", "BrokerWholesaler"] before the filter hits the datastore.

Adds a `_typename` field to all abstract type filter inputs, allowing
callers to filter results by the concrete type of each document. This
supports use cases like filtering a `retailers` query to only return
`OnlineStore` or `PhysicalStore` results.

The filter maps to the `__typename` datastore field via `name_in_index`,
and relies on `__typename` being present in all index mappings — including
single-type indices, which now store it as a `constant_keyword` (added in block#1171).

Generated with Claude Code
@marcdaniels-toast marcdaniels-toast force-pushed the mdaniels/typename-filter branch from f7397e4 to 06f5490 Compare May 13, 2026 02:03
@marcdaniels-toast marcdaniels-toast marked this pull request as ready for review May 13, 2026 02:03
@marcdaniels-toast
Copy link
Copy Markdown
Contributor Author

  • Document this--in the generated GraphQL schema, document the semantics of _typename, so clients are aware of how it works.

I did my best to clearly explain this here in supports_filtering_and_aggregation.rb

@marcdaniels-toast
Copy link
Copy Markdown
Contributor Author

As for filtering by abstract types, what about query-time expansion? When the filter interpreter sees a _typename filter value that names an abstract type, it could expand it to the set of concrete subtypes. So ["Wholesale"] becomes ["DirectWholesaler", "BrokerWholesaler"] before the filter hits the datastore.

We discussed this live and you convinced me it's likely not what users would expect. An idea for potential later implementation may be a new operator instead of equal_to_any_of that connotes matching subtypes, like is_descendent_of, descends_from or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support filtering interface/union types on __typename

2 participants