Skip to content

Promote high-volume bare errors to azderrors based on Kusto data #8079

@JeffreyCA

Description

@JeffreyCA

Phase 5 of #8011. Blocked on #8015 having shipped and 1–2 releases of Kusto data being available.

Summary

After error.chain.types (#8015) has been in production for a release or two, pull Kusto data on:

  1. The top type chains appearing inside the residual internal.<...> and internal.unclassified ResultCode buckets.
  2. The top originating packages, once error.origin.func (Add an internal/azderrors package to capture error origin frames #8077) is also live.

Use that data to identify the highest-volume bare-errors.New / fmt.Errorf callsites. Hand-migrate the top N to azderrors.New(cls, ...) with proper classification codes. Iterate.

This is intentionally a data-driven activity, not a speculative sweep. The original #8018 draft listed candidate sites (az CLI "not authenticated", PowerShell "not found", confirmDestroy decline, etc.); those remain plausible starting points but should be validated against current telemetry before committing migration effort.

Process per migration round

  1. Pull a Kusto query summarizing count() by error.chain.types, error.origin.func over the last release window.
  2. Pick the top ~10 sites that account for the long tail.
  3. Open one PR per package (or per cohesive group) introducing azderrors.New(cls, ...) with a stable cls.Code.
  4. Update any sentinel-based errors.Is callers to use the new classification path if needed.
  5. Wait for the next release cycle's data; repeat.

Out of scope

  • Any migration before Add error.chain.types span attribute #8015 has shipped and produced Kusto data — would be speculative.
  • Wholesale rewrite of all 4,800+ fmt.Errorf callsites — explicitly not the goal.
  • A static analyzer that flags non-%w fmt.Errorf — could be added later if data shows lossy-wrap dominates the catch-all.

References

  • Original Use matched YAML rule IDs as the error ResultCode #8018 draft listed candidate bare-error sites; those remain plausible but should be data-validated before migration.
  • pkg/tools/az/az.go "not authenticated", pkg/tools/powershell/powershell.go "not found", pkg/infra/provisioning/bicep/bicep_provider.go confirmDestroy decline are the issue's prior speculation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/error-handlingError suggestions, error frameworkarea/telemetryTelemetry, tracing, observabilitybugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions