Skip to content

feat(datastore): Decorate and log errors thrown during DataStore entrypoint initialization#27327

Open
markfields wants to merge 7 commits into
microsoft:mainfrom
markfields:entrypoint-load-failures
Open

feat(datastore): Decorate and log errors thrown during DataStore entrypoint initialization#27327
markfields wants to merge 7 commits into
microsoft:mainfrom
markfields:entrypoint-load-failures

Conversation

@markfields
Copy link
Copy Markdown
Member

@markfields markfields commented May 16, 2026

Description

Component load failures are one of the top scenarios our partners need to be able to monitor and diagnose in production. This PR closes a long-standing gap where errors thrown during entryPoint initialization (e.g. for PureDataObject, that includes the initalization hooks) are not logged to FF tables.

While host applications can and do log these errors, there is context that's only readily available within FF that is really important for investigation - All the common ContainerRuntime props (notably clientType - is it the Summarizer?), as well as the DataStore's packagePath (indicates the component type) and id.

This PR logs an error event with all these props, and additionally annotates the error object* with the dataStore package/id info.

* Note: In testing I found that these telemetry props get stripped off on the way up by roundtripping through exceptionToResponse and responseToException. A bit curious to look into that flow and if we could cut out those tranforms.

Copilot AI review requested due to automatic review settings May 16, 2026 21:11
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Hi! Thank you for opening this PR. Want me to review it?

Based on the diff (251 lines, 7 files), I've queued these reviewers:

  • Correctness — logic errors, race conditions, lifecycle issues
  • Security — vulnerabilities, secret exposure, injection
  • API Compatibility — breaking changes, release tags, type design
  • Performance — algorithmic regressions, memory leaks
  • Testing — coverage gaps, hollow tests

How this works

  • Adjust the reviewer set by ticking/unticking boxes above. Reviewer toggles alone don't trigger anything.

  • Tick Start review below to dispatch the review fleet.

  • After review finishes, tick Start review again to request another run — it auto-resets after each dispatch.

  • This comment updates as new commits land; your reviewer selections are preserved.

  • Start review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds telemetry for DataStore entry point initialization failures so component load issues include Fluid-specific context like DataStore identity and package path.

Changes:

  • Adds a shared helper for code-artifact-tagged DataStore telemetry properties.
  • Logs and annotates errors thrown while resolving FluidDataStoreRuntime.entryPoint.
  • Reuses the shared telemetry helper for DataStore realization errors and adds tests for entry point failure logging.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
packages/utils/telemetry-utils/src/error.ts Aligns DataProcessingError wrapping factory usage.
packages/runtime/runtime-utils/src/index.ts Exports the new DataStore telemetry helper.
packages/runtime/runtime-utils/src/dataStoreHelpers.ts Adds shared DataStore identity telemetry property helper.
packages/runtime/datastore/src/test/dataStoreRuntime.spec.ts Adds coverage for lazy entry point initialization and failure telemetry.
packages/runtime/datastore/src/dataStoreRuntime.ts Wraps, annotates, logs, and rethrows entry point initialization failures.
packages/runtime/container-runtime/src/dataStoreContext.ts Reuses the shared DataStore telemetry helper for realization errors.

Comment thread packages/runtime/container-runtime/src/dataStoreContext.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Fleet Review — Clean

No issues found across the reviewer fleet for this run.

View run

error,
"entryPointInitialization",
);
errorWrapped.addTelemetryProperties(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: In testing I found that these telemetry props get stripped off on the way up by roundtripping through exceptionToResponse and responseToException. A bit curious to look into that flow and if we could cut out those tranforms.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to do with privacy concerns? I know there can sometimes be policies to strip all error information away in case it contains user data or identifiable information.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it has to do with legacy architectural direction (URI-based routing within and across containers)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties are correctly tagged as "code artifacts", so hosts can decide if they're safe to log under different circumstances

Copy link
Copy Markdown
Contributor

@ChumpChief ChumpChief left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love wrapping at the exit to custom code! Few organizational/API suggestions.

Comment thread packages/runtime/container-runtime/src/dataStoreContext.ts Outdated
Comment thread packages/runtime/runtime-utils/src/dataStoreHelpers.ts Outdated
Comment thread packages/runtime/runtime-utils/src/dataStoreHelpers.ts Outdated
new LazyPromise(async () => provideEntryPoint(this)),
new LazyPromise(async () =>
provideEntryPoint(this).catch((error) => {
const errorWrapped = DataProcessingError.wrapIfUnrecognized(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

new LazyPromise(async () => provideEntryPoint(this)),
new LazyPromise(async () =>
provideEntryPoint(this).catch((error) => {
const errorWrapped = DataProcessingError.wrapIfUnrecognized(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if DataProcessingError is appropriate here. This runs application entry point logic and has nothing to do with Fluid right? So, maybe this is a UsageError or just a generic error?

Copy link
Copy Markdown
Contributor

@agarwal-navin agarwal-navin May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this - we should separate out the path that loads the data store / DDS data from the path that loads the application logic. And we should be able to tell these apart from telemetry. I would suggest to not use data process error for the latter because that should not cause any problems with the data (it does fail summaries today but that is an unexpected scenario and will be fixed soon)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have different impression of what DPE means. I think it's any error that happens on a critical load or process path, regardless of the source. Asking Copilot to audit usage, it buckets about 70% as FF-root-cause and 30% as app callback or hook code.

Practically speaking it doesn't really matter, because we'd expect an application to want to monitor these closely. Root cause is a second step.

PErhaps we can subdivide the errorType like you're thinking, but I think that's a bigger effort.

As for the feasibility of doing so here (capturing DDS load v. app code), I don't think the entryPoint abstraction affords us that. That really depends on the implementation. Even PureDataObject doesn't know. It happens that we typically do DDS (at least root) loading in a shared base class like DataObject or MultiFormatDataObject (moved to another repo), but even that is not a hard requirement.

Let's keep talking and we can refine this, I'm probably glossing over something you're seeing, but I'm confident that this is a reasonable use of DPE for now.

Comment thread packages/test/test-end-to-end-tests/src/test/summarization/summaries.spec.ts Outdated
Comment thread packages/test/test-end-to-end-tests/src/test/summarization/summaries.spec.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants