Skip to content

Move telemetry start to earliest point in DeploymentManager public APIs#6319

Draft
lauren-ciha wants to merge 1 commit intomainfrom
user/laurenciha/deployment-manager-investigation
Draft

Move telemetry start to earliest point in DeploymentManager public APIs#6319
lauren-ciha wants to merge 1 commit intomainfrom
user/laurenciha/deployment-manager-investigation

Conversation

@lauren-ciha
Copy link
Member

Fix telemetry gaps where errors could occur before logging began:

Problem:

  • GetStatus() had zero telemetry coverage - FAIL_FAST terminated without logging
  • Initialize()/Repair() called GetCurrentFrameworkPackageFullName() before telemetry started, missing framework discovery errors

Solution:
Start telemetry as the first action in each public API method, ensuring all error paths are captured by the WIL callback mechanism.

Changes:

DeploymentTraceLogging.h:

  • Add isGetStatus parameter to StartActivity() with default value false
  • Add TraceLoggingValue for isGetStatusAPI field in telemetry output

DeploymentManager.cpp:

  • Add GetStatus_StopSuccessActivity() helper for GetStatus telemetry
  • GetStatus(): Start telemetry first, log errors before FAIL_FAST, wrap operations in try/catch for exception capture
  • Initialize(options): Start telemetry before calling GetCurrentFrameworkPackageFullName(), catch and log discovery errors
  • Repair(): Same early-telemetry pattern as Initialize
  • Initialize(packageFullName, options, isRepair): Add IsRunning() check to avoid double-starting telemetry when called from public APIs

Result:
All error paths in DeploymentManager public APIs now have telemetry coverage. Framework discovery errors and unpackaged process failures are captured before termination.

A microsoft employee must use /azp run to validate using the pipelines below.

WARNING:
Comments made by azure-pipelines bot maybe inaccurate.
Please see pipeline link to verify that the build is being ran.

For status checks on the main branch, please use TransportPackage-Foundation-PR
(https://microsoft.visualstudio.com/ProjectReunion/_build?definitionId=81063&_a=summary)
and run the build against your PR branch with the default parameters.

Fix telemetry gaps where errors could occur before logging began:

Problem:
- GetStatus() had zero telemetry coverage - FAIL_FAST terminated without logging
- Initialize()/Repair() called GetCurrentFrameworkPackageFullName() before
  telemetry started, missing framework discovery errors

Solution:
Start telemetry as the first action in each public API method, ensuring
all error paths are captured by the WIL callback mechanism.

Changes:

DeploymentTraceLogging.h:
- Add isGetStatus parameter to StartActivity() with default value false
- Add TraceLoggingValue for isGetStatusAPI field in telemetry output

DeploymentManager.cpp:
- Add GetStatus_StopSuccessActivity() helper for GetStatus telemetry
- GetStatus() [PUBLIC]: Start telemetry first, log errors before FAIL_FAST,
  wrap operations in try/catch for exception capture
- Initialize(options) [PUBLIC]: Start telemetry before calling
  GetCurrentFrameworkPackageFullName(), catch and log discovery errors
- Repair() [PUBLIC]: Same early-telemetry pattern as Initialize
- Initialize(packageFullName, options, isRepair) [PRIVATE]: Add IsRunning()
  check to avoid double-starting telemetry when called from public APIs

Result:
All error paths in DeploymentManager public APIs now have telemetry coverage.
Framework discovery errors and unpackaged process failures are captured
before termination.
Comment on lines +23 to 24
void StartActivity(bool forceDeployment, bool isElevated, bool isPackagedProcess, bool isFullTrustPackage, DWORD integrityLevel, bool isRepair, bool isGetStatus = false)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing these two bools, we could just use an enum here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves DeploymentManager public API telemetry startup to the earliest possible point to close coverage gaps (notably before potential fail-fast or framework discovery failures), and adds an isGetStatusAPI flag to distinguish GetStatus() calls in the activity start event.

Changes:

  • Extend deployment activity start telemetry to include an isGetStatusAPI field.
  • Start deployment telemetry at the beginning of GetStatus(), Initialize(options), and Repair(), and add explicit stop paths for some failure cases.
  • Add a GetStatus_StopSuccessActivity() helper and guard internal Initialize(...) from double-starting the activity.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
dev/Deployment/DeploymentTraceLogging.h Adds an isGetStatus parameter to activity start and logs it as isGetStatusAPI.
dev/Deployment/DeploymentManager.cpp Starts telemetry earlier in public APIs, adds stop helpers/paths, and avoids double-starting the activity from internal Initialize.

You can also share your feedback on Copilot code review. Take the survey.

static_cast<PCWSTR>(nullptr),
static_cast<PCSTR>(nullptr),
static_cast<UINT32>(winrt::Microsoft::Windows::ApplicationModel::WindowsAppRuntime::DeploymentStatus::PackageInstallRequired),
static_cast<UINT32>(::WindowsAppRuntime::Deployment::Activity::DeploymentStage::DiscoverFramework),
Comment on lines +53 to +66
activityContext.GetActivity().StopWithResult(
S_OK,
static_cast<UINT32>(0),
static_cast<PCSTR>(nullptr),
static_cast<unsigned int>(0),
static_cast<PCWSTR>(nullptr),
static_cast<PCSTR>(nullptr),
static_cast<UINT32>(deploymentStatus),
static_cast<UINT32>(::WindowsAppRuntime::Deployment::Activity::DeploymentStage::None),
static_cast<PCWSTR>(nullptr),
S_OK,
static_cast<PCWSTR>(nullptr),
GUID{},
false /*useExistingPackageIfHigherVersion*/);
Comment on lines +79 to +90
// Start activity immediately with safe defaults - the WIL callback will capture any errors.
activityContext.GetActivity().Start(
false /*forceDeployment*/,
Security::IntegrityLevel::IsElevated(),
true /*isPackagedProcess - assume true, we check below*/,
false /*isFullTrustPackage*/,
0 /*integrityLevel*/,
false /*isRepair*/,
true /*isGetStatus*/);

// Now check if packaged - errors are captured by telemetry
const bool isPackagedProcess{ AppModel::Identity::IsPackagedProcess() };
Comment on lines +148 to +170
// Start telemetry FIRST so we capture all failures, including framework discovery errors.
auto& initializeActivityContext{ ::WindowsAppRuntime::Deployment::Activity::Context::Get() };

// Start activity immediately with safe defaults - the WIL callback will capture any errors
// that occur during GetCurrentFrameworkPackageFullName() or subsequent operations.
initializeActivityContext.GetActivity().Start(
deploymentInitializeOptions.ForceDeployment(),
false /*isElevated - actual value determined later*/,
true /*isPackagedProcess - assume true as safe default*/,
false /*isFullTrustPackage - determined later*/,
0 /*integrityLevel - determined later*/,
false /*isRepair*/,
false /*isGetStatus*/);

try
{
return Initialize(GetCurrentFrameworkPackageFullName(), deploymentInitializeOptions);
}
catch (...)
{
// Capture exception in telemetry before re-throwing
const HRESULT hr = wil::ResultFromCaughtException();
initializeActivityContext.GetActivity().StopWithResult(
Comment on lines +162 to +165
try
{
return Initialize(GetCurrentFrameworkPackageFullName(), deploymentInitializeOptions);
}
Comment on lines +197 to +204
initializeActivityContext.GetActivity().Start(
options.ForceDeployment(),
false /*isElevated - actual value determined later*/,
true /*isPackagedProcess - assume true as safe default*/,
false /*isFullTrustPackage - determined later*/,
0 /*integrityLevel - determined later*/,
true /*isRepair*/,
false /*isGetStatus*/);
Comment on lines +206 to +209
try
{
return Initialize(GetCurrentFrameworkPackageFullName(), options, true);
}
Comment on lines +177 to +179
static_cast<UINT32>(winrt::Microsoft::Windows::ApplicationModel::WindowsAppRuntime::DeploymentStatus::PackageInstallRequired),
static_cast<UINT32>(::WindowsAppRuntime::Deployment::Activity::DeploymentStage::DiscoverFramework),
L"GetCurrentFrameworkPackageFullName",
Comment on lines +374 to +379
// Only start telemetry if not already running (public APIs start it before calling us).
if (!initializeActivityContext.GetActivity().IsRunning())
{
initializeActivityContext.GetActivity().Start(deploymentInitializeOptions.ForceDeployment(), Security::IntegrityLevel::IsElevated(),
isPackagedProcess, initializeActivityContext.GetIsFullTrustPackage(), integrityLevel, isRepair);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Can't we just know for sure that telemetry is already started?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAII should guarantee initilaization

Comment on lines +162 to +164
try
{
return Initialize(GetCurrentFrameworkPackageFullName(), deploymentInitializeOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this new try catch, do we still need the one inside the Initialize method?

auto& activityContext{ ::WindowsAppRuntime::Deployment::Activity::Context::Get() };

// Start activity immediately with safe defaults - the WIL callback will capture any errors.
activityContext.GetActivity().Start(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the original code even work? The Start method constructs and returns the static singleton, which should then be captured via SetActivity for subsequent tracing via GetActivity. See the InstallActivityContext for comparison. In any case, we're foregoing the benefits of using TraceLoggingProvider's nested RAII TraceLoggingActivity with this pattern, instead requiring explicit Stop calls. We could create an RAII wrapper to ensure Stop is called on scope exit, but much simpler to just use TraceLoggingActivity as designed:

{
// activity scope
auto& activity = WindowsAppRuntimeDeployment_TraceLogger::::Start(...);
try
{
// do stuff ...
}
catch
{
activity.StopWithResult(...);
}
// activity goes out of scope, conditionally calls Stop
}

The same goes for WindowsAppRuntimeInstaller_TraceLogger::Install, etc.

This PR is also far too intrusive and obfuscates the real functionality with all the tracing calls. GetStatus_StopSuccessActivity is a good concept to shrink that cognitive overhead. I'd ask the bot to redesign the code to minimize the intrusion, and to leverage RAII.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This was a first pass attempt at resolving the issue. @agniuks took another try at the issue in this PR: #6323. I'll be reviewing it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants