Document azd tracing taxonomy and emit ext.install span attributes on failure#8041
Document azd tracing taxonomy and emit ext.install span attributes on failure#8041
Conversation
Agent-Logs-Url: https://github.com/Azure/azure-dev/sessions/5600dc3d-051c-4bd0-873b-ee9711b3d919 Co-authored-by: JeffreyCA <9157833+JeffreyCA@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the azd tracing reference documentation to include the existing extension and hook lifecycle span taxonomy, along with guidance on which attributes and error conventions to use so contributors instrument telemetry consistently.
Changes:
- Adds a new “Existing Event Taxonomy” section documenting
ext.*andhooks.execevents with expected attributes and sample telemetry rows. - Documents extension and hook attribute keys already defined in
internal/tracing/fields. - Adds an error attribute/status conventions section that points contributors to
internal/cmd.MapErrorfor consistent span error mapping.
- ext.install: set extension.id at span start and extension.version once resolved, so failure spans always emit the attributes that the doc promises (previously only set on success, after config save). - Doc (ext.install row): describe the actual lifecycle of the extension.id/extension.version attributes and the OpenTelemetry error status used on failure (no MapError is currently called from this span). - Doc (Error Attribute Conventions): drop the misleading error.inner / error.frame rows. error.inner is not emitted anywhere; for ARM deployment failures, error.code/error.frame are JSON keys inside the array stored on error.service.errorCode, not standalone span attributes. Document the JSON shape with an example. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jongio
left a comment
There was a problem hiding this comment.
The manager.go change is solid - setting extension.id immediately and extension.version after resolution ensures failure spans carry both attributes. Clean fix.
The doc has an accuracy problem: the "Existing Event Taxonomy" heading says these events "already exist in events.go" but three of the five don't exist anywhere in the codebase:
ext.upgrade- no constant in events.go, notracing.Startcall in cmd/extension.goext.promote- same; doesn't existhooks.exec- hooks usetracing.SetUsageAttributeson the command span (cmd/hooks.go:170), there's no separatehooks.execspan
Several documented attributes also don't exist in fields.go: hooks.kind, extension.upgrade.duration_ms, extension.upgrade.outcome, extension.source, extension.source.from, extension.source.to.
Either create these events/attributes first (separate PR), or rewrite the section to document only what actually exists today (ext.run and ext.install). Documenting aspirational telemetry as if it already exists will confuse contributors who try to use it.
|
/azp run azure-dev - cli |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
ExtensionRunEventand related tracing events existed in code but were missing from the tracing reference. This made it unclear which existing spans and attributes contributors should use for extension runs, extension management, hook execution, and errors.Event taxonomy
ext.run,ext.install,ext.upgrade,ext.promote, andhooks.exec.Attribute references
extension.id,extension.version,extension.source.*, andextension.upgrade.*.hooks.name,hooks.type, andhooks.kind.Error conventions
error.*attributes and status descriptions emitted throughMapError.error.service.*fields.{"error.code": ..., "error.frame": N}) emitted as an array onerror.service.errorCode, rather than treatingerror.code/error.frameas standalone span attributes.Code fix to keep
ext.installhonestpkg/extensions/manager.go:extension.idis now set on the install span as soon as the install begins, andextension.versionis set as soon as the version is resolved. Previously both were only set after a successful config save, so failure spans did not carry the attributes the doc promises.Example documented row: