Nested sourced_from support by ellisandrews-toast · Pull Request #1216 · block/elasticgraph

ellisandrews-toast · 2026-05-28T14:31:32Z

TODO

…ough self-update target

myronmarston

Thanks for putting this spike up. I don't have any major concerns--the architecture and approach is sound. I left some comments throughout.

Looking forward to seeing individual PRs spun off this!

myronmarston · 2026-05-28T20:06:51Z

On past projects, I've found it useful to update the schema artifacts structure (e.g. to add new fields like you are here) as a stand-alone PR before anything uses the new fields. Consider doing that for your PR stack.

myronmarston · 2026-05-28T20:30:57Z

+          @parent_relationship_config = {
+            parent_type_name: parent_type_name,
+            parent_relationship_name: parent_relationship_name
+          }


Let's define a Data class for this config. A hash is quick-and-dirty but is more prone to silently ignoring mispelled keys, etc--data_hash[:mispelled] returns nil whereas data_object.mispelled throws an exception.

The data class could be defined within this relationship class, e.g.:

class Relationship < DelegateClass(Field) # ... ParentRef = ::Data.define(:type_ref, :relationship_name) end

Also, instead of just storing the parent type name as a string, can we store it as a TypeRef? While we always accept types as strings via the APIs called by EG users, we generally convert them to type refs early on because type refs are more capable than strings :).

Implemented!

myronmarston · 2026-05-28T20:31:47Z


+        # @return [Hash, nil] configuration for parent relationship in a nested sourced_from chain
+        # @private
+        attr_reader :parent_relationship_config


Can we call this parent_relationship_ref? To me it's less configuration and more just a reference to the parent relationship.

myronmarston · 2026-05-28T20:39:55Z

The Results class is pretty large and I'd like to keep it from getting larger. It'd be nice to put your logic into another class which Results can delegate to. Actually, maybe we should preemptively extract identify_extra_update_targets_by_object_type_name into a separate class?

Maybe we can move that into SchemaUpdateTargetResolver which sits besides https://github.com/ellisandrews-toast/elasticgraph/blob/f5b4e7bce0b55bb86254898d83a336f52be2358d/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/indexing/update_target_resolver.rb ?

Moved this into a class called SourcedUpdateTargetsResolver. That class finds all sourced_from relationships (both top-level and nested), resolves them into UpdateTarget objects.

myronmarston · 2026-05-28T20:41:06Z

+
+      # Identifies update targets for sourced_from fields on non-indexed embedded types
+      # that use parent_relationship chains.
+      def identify_nested_sourced_update_targets(object_type, extra_update_targets_by_type_name, errors)


Can this logic be moved into https://github.com/ellisandrews-toast/elasticgraph/blob/f5b4e7bce0b55bb86254898d83a336f52be2358d/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/indexing/update_target_resolver.rb? IIRC that's the main spot where the update target logic for a single object type lives...which seems to be what you're working with here.

Moved this into the new SourcedUpdateTargetsResolver that delegates to both UpdateTargetResolver (for top-level) and NestedUpdateTargetResolver (for nested).

myronmarston · 2026-05-28T20:56:13Z

+            end
+
+            # Find the parent type
+            parent_type = @schema_def_state.object_types_by_name[config[:parent_type_name]]


Instead of looking up the type like this, if you use a type ref you can just do type_ref.as_object_type.

Done — storing a TypeRef on ParentRef and using ref.type_ref.as_object_type for lookup.

myronmarston · 2026-05-28T20:58:20Z

+
+// Splits a composite nested element key into a list of parts.
+List splitNestedElementKey(String nestedElementKey) {
+  return Arrays.asList(nestedElementKey.splitOnToken(":"));


In another comment I mentioned the danger of assuming : won't be in any keys. Since we can pass arbitrary JSON in our parameters, can we just pass this as a list?

Or, if you are encoding/decoding a list of strings for storage in a map key can you encode it as a JSON string (being careful to apply some "canonical" formatting)?

Per a previous comment above, I struggled to come up with a good solution here (but I bet we can find one). Implemented a naive length:value encoding for now that technically works.

Tried using Lists as map keys (which works in-memory in Painless), but it breaks when the document is serialized to JSON for storage and reloaded — JSON object keys must be strings.

Can spend more time on this aspect later.

myronmarston · 2026-05-28T21:26:33Z

+      parts.add(segment.get("object"));
+    }
+  }
+  return String.join(":", parts);


In another comment I mentioned the danger of assuming : won't be in any keys. We should figure out a way to encode nested keys that avoids making assumptions about what could be in the key values...

Yes, agreed. Mentioned some thoughts and current workaround in other comments.

myronmarston · 2026-05-28T21:28:03Z

+        "because this element was previously sourced from a different event (" + previousSourceIds + "). " +
+        "Each nested element can only be sourced from one source document."
+      );
+    }


/nit can we unify these two exceptions messages into one message that makes sense for both cases?

myronmarston · 2026-05-28T21:29:45Z

+}
+
+// Applies nested sourced data from the __nested_sourced_data buffer to matched nested elements.
+// Reads path config from the document itself — no external params needed.


That's interesting--dose that mean we are storing some static path config on each document? Seems like that would be inefficient to store the same path config on billions of docs rather than pass it in as params...

As discussed on Zoom, was doing this because it was awkward to thread this static config through to the script the "normal" way.

I decided to instead store this on the IndexDefinition (the per-index runtime metadata) and pass it to the painless script as a param (we are already doing this kind of thing for __counts). I think this makes sense because the path config is static configuration that describes the structure of nested relationships within an index — it's the same for every document in that index, and needs to be known by any update event targeting that index. The script reads it from params.nestedSourcedPaths at execution time. The __nested_sourced_data structure on the document now only stores the actual sourced field values (the data that varies per nested element), not the path navigation config.

ellisandrews-toast added 2 commits May 28, 2026 10:17

Implement nested sourced_from support

8ca4bd1

Store nested sourced path config in document instead of threading thr…

5c503ba

…ough self-update target

myronmarston reviewed May 28, 2026

View reviewed changes

Address PR feedback: refactor nested sourced_from implementation

df5f1c4

ellisandrews-toast force-pushed the nested-sourced-from branch from 0d96e36 to df5f1c4 Compare May 29, 2026 23:25

Conversation

ellisandrews-toast commented May 28, 2026

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants