Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 6 additions & 10 deletions .github/workflows/on-push-master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,20 +65,16 @@ jobs:
version_directory: v20111101
secrets: inherit

# Gate job to handle serial ordering when v20111101 is modified
# Depends on release-v20111101 to enforce serial ordering (waits for v20111101 to complete)
# Uses always() to continue even when release-v20111101 is skipped (when v20111101 wasn't modified)
# This ensures: v20111101 publishes first (serially) → then v20250224 publishes (serially)
gate-v20111101-complete:
delay-for-v20250224:
runs-on: ubuntu-latest
needs: [check-skip-publish, detect-changes, release-v20111101]
if: always() && needs.check-skip-publish.outputs.skip_publish == 'false'
needs: [check-skip-publish, detect-changes]
if: needs.check-skip-publish.outputs.skip_publish == 'false'
steps:
- name: Gate reached - v20111101 release complete (or skipped)
run: echo "Ready to proceed with v20250224 publication"
- name: Brief delay to stagger v20250224 publish
run: sleep 2

publish-v20250224:
needs: [check-skip-publish, detect-changes, gate-v20111101-complete]
needs: [check-skip-publish, detect-changes, delay-for-v20250224]
if: needs.check-skip-publish.outputs.skip_publish == 'false' && needs.detect-changes.outputs.v20250224 == 'true'
uses: ./.github/workflows/publish.yml
with:
Expand Down
48 changes: 25 additions & 23 deletions docs/Adding-a-New-API-Version.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**Document Purpose**: Step-by-step guide for adding support for a new API version (e.g., `v20300101`) to the mx-platform-node repository.

**Last Updated**: January 28, 2026
**Last Updated**: January 29, 2026
**Time to Complete**: 30-45 minutes
**Prerequisites**: Familiarity with the multi-version architecture (see [Multi-Version-SDK-Flow.md](Multi-Version-SDK-Flow.md))

Expand Down Expand Up @@ -238,15 +238,15 @@ Add a new publish job for your version (copy and modify the existing v20250224 j

```yaml
publish-v20300101:
needs: [check-skip-publish, detect-changes, gate-v20250224-complete]
needs: [check-skip-publish, detect-changes, delay-for-v20300101]
if: needs.check-skip-publish.outputs.skip_publish == 'false' && needs.detect-changes.outputs.v20300101 == 'true'
uses: ./.github/workflows/publish.yml
with:
version_directory: v20300101
secrets: inherit
```

**Important**: The `needs` array must include the **previous version's gate job** to enforce serial ordering. This ensures v20250224 finishes before v20300101 starts publishing.
**Important**: The `needs` array must include the **delay job for this version** to enforce staggered publishing. This creates a small delay before your version starts publishing, ensuring previous versions get first chance at npm registry.

**Location 4: Add release job for new version**

Expand All @@ -262,38 +262,40 @@ release-v20300101:
secrets: inherit
```

**Location 5: Add gate job for previous version**
**Location 5: Add delay job for new version**

Add a new gate job after the previous version's release to handle serial ordering:
Add a new delay job before the publish job to create staggered publishing:

```yaml
gate-v20250224-complete:
delay-for-v20300101:
runs-on: ubuntu-latest
needs: [check-skip-publish, detect-changes, release-v20250224]
if: always() && needs.check-skip-publish.outputs.skip_publish == 'false'
needs: [check-skip-publish, detect-changes]
if: needs.check-skip-publish.outputs.skip_publish == 'false'
steps:
- name: Gate reached - v20250224 release complete (or skipped)
run: echo "Ready to proceed with v20300101 publication"
- name: Brief delay to stagger v20300101 publish
run: sleep 2
```

**Critical implementation details**:

1. **Each publish job** depends on the **previous version's gate job** (not the previous release directly)
- This prevents race conditions when multiple versions are modified
- Ensures strict serial ordering at the npm registry level
1. **Each delay job** is independent and depends only on safety checks
- Does NOT depend on the previous version
- Always runs (assuming `[skip-publish]` flag not set)
- Provides a 2-second window for previous versions to start publishing

2. **Each release job** depends on its corresponding publish job
- Ensures publication completes before creating release
2. **Each publish job** depends on its corresponding delay job
- This naturally staggers version publishes without complex dependencies
- When only one version is modified, its delay still runs (no blocking)
- When multiple versions are modified, they publish sequentially with 2-second gaps

3. **Each gate job** uses `needs: [check-skip-publish, detect-changes, release-v<VERSION>]`
- Waits for the previous version's release to complete
- The `if: always()` condition ensures the gate continues running even when the release job is **skipped**
- This is crucial: when the previous version isn't modified, its release is skipped, but the gate still runs and unblocks the next version
3. **Each release job** depends on its corresponding publish job
- Ensures publication completes before creating release

4. **Each publish/release if condition** uses `needs.detect-changes.outputs.v<VERSION> == 'true'`
- This is more reliable than the older `contains()` pattern
- Uses the path-filter outputs to determine which versions changed
- Prevents false publishes when only docs change
4. **Simple, non-blocking design**:
- No `always()` conditions needed
- No dependencies on other versions' jobs
- Delay job always runs independently
- Prevents race conditions through simple timing, not complex job logic

### 2.5 Verify Workflow Syntax

Expand Down
31 changes: 18 additions & 13 deletions docs/Troubleshooting-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**Document Purpose**: Quick reference for diagnosing and fixing issues in the multi-version SDK generation, publishing, and release workflows.

**Last Updated**: January 28, 2026
**Last Updated**: January 29, 2026
**Audience**: Developers debugging workflow failures

---
Expand Down Expand Up @@ -234,32 +234,37 @@ fatal: A release with this tag already exists

**Expected Behavior**: `publish-v20250224` should run when only v20250224 is modified

**Root Cause**: Previous versions of the workflow had a dependency chain that broke when intermediate jobs were skipped. This has been fixed with the gate job pattern.
**Root Cause**: Previous versions of the workflow had dependencies that broke when intermediate jobs were skipped. This has been fixed with the delay job pattern.

**Current Implementation** (uses gate job pattern):
- `gate-v20111101-complete` uses GitHub Actions `always()` condition
- This job runs even when v20111101 jobs are skipped
- It unblocks downstream v20250224 jobs
**Current Implementation** (uses delay job pattern):
- `delay-for-v20250224` runs independently of other versions
- This delay job always runs (depends only on safety checks, not other versions)
- It provides a 2-second window for previous versions to start publishing first
- v20250224 publish depends on this delay (not on v20111101's release)
- Result: Publishing works correctly whether one or both versions are modified

**If You're Still Seeing This Issue**:
1. Verify you have the latest `on-push-master.yml`:
```bash
grep -A 3 "gate-v20111101-complete" .github/workflows/on-push-master.yml
grep -A 5 "delay-for-v20250224" .github/workflows/on-push-master.yml
```
2. Confirm the gate job uses `always()` condition:
2. Confirm the delay job is independent:
```yaml
gate-v20111101-complete:
if: always() && needs.check-skip-publish.outputs.skip_publish == 'false'
delay-for-v20250224:
needs: [check-skip-publish, detect-changes]
if: needs.check-skip-publish.outputs.skip_publish == 'false'
steps:
- name: Brief delay to stagger v20250224 publish
run: sleep 2
```
3. Ensure `publish-v20250224` depends on the gate job:
3. Ensure `publish-v20250224` depends on the delay job:
```yaml
publish-v20250224:
needs: [check-skip-publish, gate-v20111101-complete]
needs: [check-skip-publish, detect-changes, delay-for-v20250224]
```
4. If not present, update workflow from latest template

**Technical Details**: See [Workflow-and-Configuration-Reference.md](Workflow-and-Configuration-Reference.md#step-3-gate-job---unblock-v20250224-publishing) in the "Publishing via on-push-master.yml" section for full gate job implementation details.
**Technical Details**: See [Workflow-and-Configuration-Reference.md](Workflow-and-Configuration-Reference.md#step-3-delay-job---stagger-v20250224-publishing) in the "Publishing via on-push-master.yml" section for full delay job implementation details.

---

Expand Down
99 changes: 56 additions & 43 deletions docs/Workflow-and-Configuration-Reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**Document Purpose**: Detailed technical reference for the multi-version SDK generation, publishing, and release workflows. Covers implementation details, configuration files, and system architecture.

**Last Updated**: January 28, 2026
**Last Updated**: January 29, 2026
**Audience**: Developers who need to understand or modify the implementation

---
Expand Down Expand Up @@ -145,16 +145,17 @@ strategy:
4. Path-based filtering ensures only modified versions are published, never in parallel

**Serialization Chain** (for race condition prevention):
- v20111101 publish runs first (depends on check-skip-publish)
- v20111101 publish runs immediately (depends on check-skip-publish)
- v20111101 release runs second (depends on publish) - waits for npm registry confirmation
- **gate-v20111101-complete** runs (uses `always()`, runs even if v20111101 jobs are skipped) ⭐ **Critical: Enables single-version publishing**
- v20250224 publish runs third (depends on gate job) ← **Serial ordering enforced**
- **delay-for-v20250224** waits 2 seconds (prevents race condition by staggering v20250224) ⭐ **Critical: Enables single-version publishing**
- v20250224 publish runs third (depends on delay) ← **Serial ordering enforced**
- v20250224 release runs fourth (depends on v20250224 publish) - waits for npm registry confirmation

**Why This Order Matters**:
- Each version publishes to npm sequentially, never in parallel
- npm registry expects sequential API calls; parallel publishes can cause conflicts
- Gate job ensures this ordering works correctly whether 1 or 2 versions are modified
- Delay job ensures v20250224 doesn't start immediately, giving v20111101 time to publish
- This ordering works correctly whether 1 or 2 versions are modified
- Release jobs complete before the next version starts publishing

---
Expand Down Expand Up @@ -289,42 +290,52 @@ publish:
- ❌ **Harder to understand**: New developers see one job with matrix logic; harder to reason about sequence
- ❌ **Less flexible**: Adding safety checks per version becomes complicated with matrix expansion

#### Why Serial Conditionals (Our Choice)
#### Why Serial Conditionals with Delay (Our Choice)

**Serial Approach** (Explicit, safe, maintainable):
```yaml
publish-v20111101:
needs: [check-skip-publish, detect-changes]
if: needs.check-skip-publish.outputs.skip_publish == 'false' && needs.detect-changes.outputs.v20111101 == 'true'

delay-for-v20250224:
needs: [check-skip-publish, detect-changes]
if: needs.check-skip-publish.outputs.skip_publish == 'false'
steps:
- run: sleep 2

publish-v20250224:
needs: [check-skip-publish, detect-changes, gate-v20111101-complete] # Must wait for gate
needs: [check-skip-publish, detect-changes, delay-for-v20250224] # Wait for delay
if: needs.check-skip-publish.outputs.skip_publish == 'false' && needs.detect-changes.outputs.v20250224 == 'true'
```

**Advantages**:
- ✅ **Safe**: v20250224 cannot start publishing until v20111101 finishes
- Gate job ensures serial ordering at job level, not just workflow level
- ✅ **Safe**: v20250224 cannot start publishing until delay completes
- 2-second delay ensures v20111101 has time to publish first
- npm registry sees sequential requests, no conflicts
- Clear happens-before relationship in GitHub Actions UI
- Works whether 1 or 2 versions are modified
- ✅ **Simple**: No complex gate job logic with `always()` conditions
- Just a straightforward 2-second delay between version publishes
- Delay job always runs (depends only on safety checks)
- Less mental overhead for future developers
- ✅ **Visible**: Each version has individual jobs that are easy to identify
- GitHub Actions shows separate rows for each version
- Failures are obvious: "publish-v20250224 failed" vs "publish[v20250224] in matrix"
- Failures are obvious
- Each job can have version-specific comments and documentation
- ✅ **Debuggable**: Clear dependencies make it obvious what blocks what
- When only v20250224 is modified, you see: `publish-v20111101 (skipped)` → `gate (runs)` → `publish-v20250224 (runs)`
- Matrix approach would be harder to understand why certain jobs run/skip
- ✅ **Maintainable**: Adding a new version requires adding 3 explicit jobs (publish, release, gate)
- More code, but each job is self-documenting
- No complex matrix expansion logic to understand
- Future developers can see the pattern easily: "oh, each version gets 3 jobs"
- v20250224 waits for delay, which doesn't depend on v20111101
- When only v20250224 is modified, you see: `delay (runs)` → `publish-v20250224 (runs)`
- When both are modified, you see: `publish-v20111101` and `delay` run in parallel, then `publish-v20250224` waits
- ✅ **Maintainable**: Minimal code addition (one simple delay job)
- More explicit than matrix approach
- Future developers immediately understand: "oh, there's a delay between version publishes"
- ✅ **Future-proof**: When you lock master, this structure stays the same
- Matrix would need version list hardcoded; serial jobs just live alongside each other
- Simple delay job that can be extended if needed

**Tradeoff we accepted**:
- We have more code (repetition): `publish-v20111101`, `publish-v20250224`, etc.
- BUT: The repetition is worth it for safety, clarity, and debuggability
- This is a conscious choice: **explicitness over DRY** for critical infrastructure
- Slight overhead: 2-second delay added to every publish flow (negligible)
- BUT: Worth it for simplicity, clarity, and the ability to publish single versions without gates
- This is a conscious choice: **simplicity over clever infrastructure** for critical workflows



Expand All @@ -340,7 +351,7 @@ Include `[skip-publish]` in commit message to prevent publish/release for this p

**Workflow**: `.github/workflows/on-push-master.yml`

**Architectural Approach**: Serial job chaining with gate job pattern ensures single-version and multi-version publishing both work correctly while preventing npm race conditions.
**Architectural Approach**: Serial job chaining with delay job ensures single-version and multi-version publishing both work correctly while preventing npm race conditions.

#### Step 1: Check Skip-Publish Flag

Expand Down Expand Up @@ -377,47 +388,49 @@ Include `[skip-publish]` in commit message to prevent publish/release for this p
1. Publish job calls `publish.yml` with `version_directory: v20111101`
2. Release job calls `release.yml` after publish completes

#### Step 3: Gate Job - Unblock v20250224 Publishing
#### Step 3: Delay Job - Stagger v20250224 Publishing

**Job**: `gate-v20111101-complete`
**Job**: `delay-for-v20250224`

```yaml
gate-v20111101-complete:
delay-for-v20250224:
runs-on: ubuntu-latest
needs: [check-skip-publish, detect-changes, release-v20111101]
if: always() && needs.check-skip-publish.outputs.skip_publish == 'false'
needs: [check-skip-publish, detect-changes]
if: needs.check-skip-publish.outputs.skip_publish == 'false'
steps:
- name: Gate complete - ready for v20250224
run: echo "v20111101 release workflow complete (or skipped)"
- name: Brief delay to stagger v20250224 publish
run: sleep 2
```

**Key Feature**: Uses `always()` condition - runs even when `release-v20111101` is skipped
**Key Feature**: Simple 2-second delay between version publishes

**Why This Pattern Exists**:

The gate job solves a critical dependency problem in serial publishing:
The delay job solves the dependency problem while keeping things simple:

1. **The Problem**:
- If v20250224 publish job depends on `release-v20111101`, it fails when v20111101 is skipped (not modified)
- If v20250224 publish depends directly on `publish-v20111101`, it fails when v20111101 is skipped (not modified)
- When only v20250224 is modified, we want it to publish, but it's blocked by skipped v20111101 job
- This would cause the workflow to hang/fail when only one version is modified
- A gate job with `always()` is complex and hard to understand

2. **The Solution**:
- Gate job uses `always()` so it runs whether v20111101 succeeds, fails, or is skipped
- v20250224 jobs depend on the gate job (which always runs), not on v20111101 (which might be skipped)
- This unblocks v20250224 while maintaining serial ordering when both versions are modified
- Simple delay job that doesn't depend on v20111101
- v20250224 publish depends on the delay (which always runs)
- 2-second delay gives v20111101 time to start publishing before v20250224 does
- This staggering prevents npm registry race conditions without complex job logic

3. **The Behavior**:
- **Both versions modified**: publish v20111101 → release v20111101 → gate (runs)publish v20250224 → release v20250224
- **Only v20250224 modified**: (v20111101 jobs skipped)gate (always runs, unblocks) → publish v20250224 → release v20250224
- **Only v20111101 modified**: publish v20111101 → release v20111101 → gate (always runs) → publish v20250224 (skipped) → release v20250224 (skipped)
- **Both versions modified**: publish v20111101 starts immediately, delay job starts immediatelyafter 2s, publish v20250224 runs
- **Only v20250224 modified**: delay job runsafter 2s, publish v20250224 runs (v20111101 jobs skipped, don't block)
- **Only v20111101 modified**: publish v20111101 runs, delay runs but is unused (no harm)

**Why Not Use Direct Dependencies?**
If v20250224 jobs depended directly on v20111101's release job, the workflow would fail whenever v20111101 was skipped (not modified). The gate job pattern enables:
If v20250224 jobs depended directly on v20111101's publish job, the workflow would fail whenever v20111101 was skipped (not modified). The delay job pattern enables:
- ✅ Correct behavior in single-version and multi-version scenarios
- ✅ Maintains serial ordering when both versions change
- ✅ Maintains serial ordering by staggering version publishes
- ✅ Prevents race conditions at npm registry level
- ✅ Clear, explicit dependency chain in GitHub Actions UI
- ✅ Simple, easy-to-understand logic

#### Step 4: Publish and Release v20250224 (Second in Serial Chain)

Expand All @@ -426,7 +439,7 @@ If v20250224 jobs depended directly on v20111101's release job, the workflow wou
**publish-v20250224 executes when**:
- No `[skip-publish]` flag
- Files in `v20250224/**` were changed
- **AND** `gate-v20111101-complete` completes (ensures serial ordering)
- **AND** `delay-for-v20250224` completes (ensures staggered publishing)

**release-v20250224 executes when**:
- No `[skip-publish]` flag
Expand All @@ -437,7 +450,7 @@ If v20250224 jobs depended directly on v20111101's release job, the workflow wou
1. Publish job calls `publish.yml` with `version_directory: v20250224`
2. Release job calls `release.yml` after publish completes

**Serial Chain Benefit**: Even though both versions could publish in parallel, the gate job ensures v20250224 waits for v20111101 release, preventing npm registry race conditions when both versions are modified.
**Serial Chain Benefit**: The 2-second delay before v20250224 starts publishing ensures v20111101 gets first chance at npm registry, preventing race conditions when both versions are modified.

---

Expand Down