xpk: add team-based PoC quota routing with ConfigMap discovery by ultrons · Pull Request #1180 · AI-Hypercomputer/xpk

ultrons · 2026-04-28T17:34:12Z

Add team-based queue routing to `xpk workload create`

Summary

This adds three flags to xpk workload create and one new sub-command
xpk workload status, enabling xpk to route a workload to a team-specific
Kueue ClusterQueue / LocalQueue / PriorityClass on clusters that have been set
up with multi-team quota sharing:

--team=<name> — routes the JobSet to a per-team namespace + LocalQueue +
PriorityClass, overriding --priority.
--value-class=<name> — adds a job-class label for downstream audit and
ordering.
--declared-duration-minutes=<int> — added to the pod-template metadata so a
cluster-side time-limit controller can stop overrunning jobs.
xpk workload status --team=<t> [--workload=<name>] — diagnoses whether a
workload is QUEUED / STUCK / RUNNING / FINISHED, and on STUCK cases parses
recent Warning events to surface common admission failures (with a
copy-pasteable shorter-name suggestion when the super-slice 49-char limit is
hit).

When --team is not set, behavior is bit-identical to upstream — no
cluster calls, no new error paths, no new dependencies.

When --team is set, xpk reads a kueue-system/poc-team-config ConfigMap
on the target cluster to discover the available teams, namespaces, value
classes, and the super-slice name-length budget. This means adding,
removing, or retuning a team is a cluster-side config change with no xpk
release needed.

Why a ConfigMap, not a hardcoded dict

An earlier iteration of this work (visible in the commit history of the fork)
shipped the team list as a Python dict in core/kueue_manager.py. That made
every cluster-side change (adding a team, retuning a quota, fixing a typo in a
PriorityClass name) require an xpk release + every user reinstalling. The
current shape moves the source of truth to the cluster: a small ConfigMap
rendered by the cluster admin's tooling. xpk's view of "what teams exist" is
always whatever kubectl get cm -n kueue-system poc-team-config -o json says
right now.

User-visible UX additions

Did-you-mean

$ xpk workload create --cluster=foo --team=mlpef ...
ERROR: --team='mlpef' not found on this cluster. Did you mean: ml-perf?
       Available teams: dev, gsc, ml-perf, nightly-regression, scale-test

Tab completion (via existing `argcomplete` integration)

$ xpk workload create --cluster=bodab<TAB>
  bodaborg-super-alpha-cluster  bodaborg-tpu7x-8
  bodaborg-tpu7x-auto-nap2      bodaborg-tpu7x-nap-auto

$ xpk workload create --cluster=bodaborg-super-alpha-cluster --team=<TAB>
  dev  gsc  ml-perf  nightly-regression  scale-test

--cluster completes from kubectl config get-contexts (GKE prefix).
--team and --value-class complete from a small per-cluster cache at
~/.xpk/poc-cache/<context>.json that is refreshed every time a --team
invocation succeeds — so completion never blocks on a cluster call.

Workload-name decoupling

Users can now pass any-length --workload name. xpk derives a short K8s-safe
JobSet name ({ldap_prefix}-{hex4} deterministic from
sha256(display_name)) that fits the super-slice admission controller's
49-char ceiling, and stashes the original on the JobSet via the
xpk.google.com/workload label. xpk workload status / delete / list then
use the label as the lookup key, so the user never needs to know the
truncated form.

Commits

xpk: add team-based routing flags for PoC quota system — the three new
flags, plumbed into the JobSet template. Initial implementation has the
team table hardcoded.
xpk: add workload status PoC queue diagnostic command — the new
sub-command and its parser.
xpk: ConfigMap-driven team discovery + local cache for completion —
replaces the hardcoded table with runtime discovery, decouples the
user-facing workload name from the K8s JobSet name, and adds the local
cache that powers tab completion + did-you-mean. The end state of this
commit is the actual feature the user-facing surface should be reviewed
against; the earlier two are kept separate for reviewability of the layered
design rather than because they ship a useful intermediate state.

Backward compatibility

No new required flags anywhere.
xpk workload create without --team is unchanged: same default namespace,
same multislice-queue, same priority handling, no extra cluster calls.
xpk workload create --help no longer enumerates a fixed choices=[...]
for --team / --value-class (because the set is cluster-dependent now);
the help text points to runtime discovery. If maintainers prefer
argparse-time validation, an alternative is to keep a non-strict default
list in help text only.

Note on naming

These flags and the poc-team-config ConfigMap are PoC-named throughout
because they were developed for a specific TPU quota PoC. The mechanism
itself is general-purpose ("team-aware queue routing for Kueue-enabled
clusters") and could be renamed if upstream prefers — happy to follow up
with a rename once we agree on a final name. Wanted to land the
implementation first to get the design discussion grounded.

Test plan

xpk workload create (no --team): unchanged behavior, no cluster
call for team discovery.
xpk workload create --team=<valid>: routes correctly to
<namespace> / lq / <priorityClass>; pod-template carries the three
labels.
xpk workload create --team=<typo>: errors with did-you-mean +
available list, no JobSet created.
xpk workload create --team=<valid> against a cluster with no
poc-team-config ConfigMap: errors with deployment hint, no JobSet
created.
xpk workload status --team=<t>: lists all workloads in the team's
namespace with QUEUED/STUCK/RUNNING/FINISHED diagnoses and quota
percentages.
xpk workload status --team=<t> --workload=<long-display-name>:
resolves to the K8s JobSet via the xpk.google.com/workload label.
register-python-argcomplete xpk then <TAB> on --cluster,
--team, --value-class: returns expected completions.
Multi-cluster: cache at ~/.xpk/poc-cache/<context>.json is
per-cluster; completing on cluster A doesn't suggest teams from
cluster B.

Out of scope

The cluster-side resources that produce the poc-team-config ConfigMap
(Kueue ClusterQueues, namespaces, time-limit controller, the chart that
renders them) live outside this PR. Reviewers don't need to deploy them
to assess the xpk side; the discovery code is a no-op for any cluster
that doesn't have the ConfigMap and --team unset.

Adds three flags to `xpk workload create`: --team routes the job to <namespace>/<lq>/<priorityClass> --value-class job-class label for audit and priority ordering --declared-duration-minutes honest p90 estimate, used by a time-limit controller When --team is set: - Pod template is placed in the team's namespace (poc-<team>) and labeled with the team's LocalQueue and PriorityClass. - declared-duration-minutes is propagated into the pod template's metadata labels (Kueue does not copy arbitrary JobSet metadata to the Workload, but it does copy pod-template metadata into spec.podSets[*].template). When --team is unset, behavior is identical to upstream (default namespace, multislice-queue). Team routing initially uses a hardcoded POC_TEAM_CONFIG dict in core/kueue_manager.py; a follow-up commit replaces this with discovery from a cluster-side ConfigMap.

Adds `xpk workload status --team=<t> --workload=<name>` (or omit --workload to list all). The command tells the user, with one diagnostic, whether their workload is: - QUEUED (and what's ahead of them) - STUCK (quota reserved but admission failing) - RUNNING - FINISHED For STUCK workloads the command parses recent Warning events and surfaces the most common causes (e.g. slice name length over the super-slice 49-char limit), with a copy-pasteable shorter-name suggestion. Looks up workloads by the xpk.google.com/workload label so the user can pass their full display name regardless of any internal name shortening. Cluster credentials are fetched at runtime; the user no longer needs compute/zone in their gcloud config for this command to work.

Three related changes that move team configuration from hardcoded dicts to runtime discovery, eliminating xpk releases as the gating step for cluster admin changes: 1. Decouple xpk --workload from the K8s JobSet name. xpk now derives a short, deterministic K8s name (`{ldap_prefix}-{hex4}`) that fits the super-slice admission controller's 49-char limit, while keeping the user-facing display name on the JobSet via the xpk.google.com/workload label. Users pass the same display name to `workload status / delete / list` regardless of length. 2. Replace POC_TEAM_CONFIG and POC_TEAM_MAX_WORKLOAD_NAME dicts (and the two argparse choices=[...] lists) with discovery from a kueue-system/poc-team-config ConfigMap. New module core/poc_discovery.py: fetch_poc_config(), resolve_team(), max_k8s_workload_name_len(), available_teams(), available_value_classes(). When --team is unset there is no cluster call — upstream behavior is preserved bit-for-bit. When --team is set with a value not on the cluster, xpk now prints the live available list as the error. 3. Add an opportunistic local cache under ~/.xpk/poc-cache/<context>.json so neither tab completion (argcomplete completers on --cluster, --team, --value-class) nor did-you-mean suggestions (`Did you mean: ml-perf?`) need to call the cluster on every keystroke. The cache is refreshed any time a live ConfigMap fetch succeeds; if it is missing or stale the worst case is a slightly less useful error message — never a wrong-routing decision.

When a PoC team is in use, args.workload is the user-friendly display name (e.g. mlperf-conv-13) and the actual K8s JobSet name is a derived short name (e.g. mlperf-d6cc). The "Follow your workload here:" and "Follow your worker 0, slice 0 logs here:" URLs were hardcoding namespace="default" and using args.workload for the pod prefix, so they pointed nowhere — the workload lives in the team namespace and its pods are named after the JobSet, not args.workload. This commit: - Initializes poc_namespace/k8s_name early in workload_create so the PoC values are available where the URLs are emitted. - Substitutes those values into both the dashboard URL and the Cloud Logging filter (namespace_name and pod_name prefix). - Falls back to the upstream behavior (namespace=default, prefix= args.workload) when no PoC team is set, so non-PoC users are unaffected.

jamOne-

Thanks for the PR!

Please also resolve the failing presubmits.
Please add unit tests to your changes.

jamOne- · 2026-04-29T12:09:28Z

 _SUPER_SLICING_WORKLOAD_NAME_LIMIT = 28
+
+
+def _load_poc_cfg(args) -> dict | None:


What is PoC and why do we name our code after it?

Fair point — "PoC" was an internal project name and shouldn't leak into upstream code. Renamed throughout: poc_discovery.py → quota_discovery.py, _resolve_poc_team → _resolve_quota_team, _load_poc_cfg → _load_quota_cfg, _poc_cfg → _quota_cfg, _poc_labels → _team_labels, etc. All user-facing strings (parser help text, error messages, log lines) now use neutral wording like "team-quota" / "team-routing". The cluster-side ConfigMap name poc-team-config is kept as-is for now since renaming it is a chart-side change that needs separate coordination — happy to do that as a follow-up.

Nice, thanks!

The poc-team-config name, let's change that as well please. We can make the name configurable in the config like: xpk config set team-configmap-name poc-team-config if you do not want to update your cluster.

You can read the name like here:

xpk/src/xpk/core/telemetry.py

Line 39 in 20e3da5

and get_config().get(SEND_TELEMETRY_KEY) != "false"

.
And prepare the config key like here:

xpk/src/xpk/core/config.py

Lines 59 to 67 in 20e3da5

DEFAULT_KEYS = [

CFG_BUCKET_KEY,

CLUSTER_NAME_KEY,

PROJECT_KEY,

CLIENT_ID_KEY,

SEND_TELEMETRY_KEY,

ZONE_KEY,

CUSTOM_BINARIES_PATH_KEY,

]

.

Done in 3a9ac68. Default is now team-quota-config (no poc- literal anywhere). Operators with the legacy ConfigMap can override without touching the cluster: xpk config set team-configmap-name poc-team-config. Wired exactly per your references — added TEAM_CONFIGMAP_NAME_KEY to core/config.py's DEFAULT_KEYS, and quota_discovery._configmap_name() reads it via
get_config().get(...) with the default as fallback. +2 tests covering both the default and the override path.

jamOne- · 2026-04-29T12:12:26Z

+  # For PoC teams, inject the user's full display name as XPK_WORKLOAD_NAME so
+  # training scripts can reference it for GCS artifact paths regardless of the
+  # short K8s name xpk derives for the JobSet.
+  if getattr(args, 'team', None):
+    args.env.setdefault('XPK_WORKLOAD_NAME', args.workload)


This looks like a hack. Do we need this name shortening?

The shortening exists because the super-slice admission controller has a 49-char hard limit on the slice name it creates from each JobSet (format: -jobset---slice-job-). Subtracting the fixed framing (~26 chars) and the namespace length leaves a small budget for the user-facing workload name — e.g. for poc-ml-perf (11 chars), only 12 chars are available. User-friendly names like mlperf-conv-13 (14 chars) blow past this, the JobSet/Slice fails admission, and the user sees a confusing low-level error. Without shortening, this is a real foot-gun.

Updated this commit to:

Add an inline comment explaining the constraint and pointing at quota_discovery.max_k8s_workload_name_len for the budget math.

Add --no-shorten-jobset-name so users can opt out when their workload name is short enough already.

Only print the "workload → JobSet name" mapping when shortening actually triggered (no-op cases stay silent).

@FIoannides I need your review on that one. IMO this decision can be breaking for some users.

jamOne- · 2026-04-29T12:13:25Z

    )

+    poc_namespace, poc_local_queue, poc_priority = _resolve_poc_team(args)
+    # Derive a short K8s-safe JobSet name for PoC teams; use display name as-is otherwise.


But why? This can be unexpected from a user perspective. We have also other commands that take workload as an argument, e.g. xpk inspector.

So it actually required for the --teams changes?

Yes — same root cause as the previous comment. With --team, the workload routes to a per-team namespace, which both forces the JobSet to be created in that namespace AND eats into the slice-name char budget. Both behaviors only kick in when --team is set; the upstream non-team path is unchanged. Added a docstring + inline comment explaining the conditional. Plus the --no-shorten-jobset-name opt-out flag for users who'd rather see the create fail loudly than have their name silently shortened

Renamings: - core/poc_discovery.py -> core/quota_discovery.py - _resolve_poc_team -> _resolve_quota_team, _load_poc_cfg -> _load_quota_cfg, fetch_poc_config -> fetch_quota_config - args._poc_cfg -> args._quota_cfg - _build_poc_labels -> _build_team_labels (and pod-template variant) - {poc_labels} / {poc_pod_template_labels} YAML keys -> {team_labels} / {team_pod_template_labels} - Local cache dir: ~/.xpk/poc-cache -> ~/.xpk/quota-cache - All "PoC" references in user-facing strings, docstrings and parser help text replaced with neutral team-quota wording. Structure: - Replace ambiguous (str, str, str) tuple return of resolve_team() with @DataClass TeamRouting{namespace, local_queue, priority_class}. - Move workload_status() and its helpers from commands/workload.py to a dedicated commands/workload_status.py. - In core/kueue_manager.py, move derive_k8s_workload_name() below the module-level constants block so the constants stay together at the top. - Tightened "team is set" check to a real non-empty-string test (still passes existing MagicMock-based tests). UX: - Add --no-shorten-jobset-name flag so the JobSet-name shortening is opt-out for users whose workload names already fit the super-slice admission charLimit budget. Justify the shortening with an inline comment that points at quota_discovery.max_k8s_workload_name_len for the budget math. - Only print the "workload -> JobSet name" message when the name was actually shortened. Tests: - New core/quota_discovery_test.py (17 tests): resolve_team / available_teams / available_value_classes / max_k8s_workload_name_len / suggest / fetch_quota_config (mocked). - Append derive_k8s_workload_name tests to core/kueue_manager_test.py (5 tests): determinism, length cap, ldap-prefix shape, distinct inputs, short-input behavior. - All 502 tests pass (+23 new). - Goldens regenerated for the renamed YAML template variables.

ultrons · 2026-04-29T22:57:01Z

Thanks for the PR!

Please also resolve the failing presubmits.

Please add unit tests to your changes.

Added 23 new tests:

core/quota_discovery_test.py (new, 17 tests): resolve_team for known/unknown/empty-map cases, available_teams / available_value_classes happy + missing-key paths, max_k8s_workload_name_len with cfg / defaults / partial overrides, suggest did-you-mean ranking, and fetch_quota_config with kubectl output mocked (success / failure / malformed JSON / cache-write side effect).
core/kueue_manager_test.py (extended, 6 tests): derive_k8s_workload_name determinism, max_len cap, ldap-prefix shape, distinct inputs producing distinct hashes, short-input still-hashed.

All 502 tests pass locally. Goldens regenerated for the renamed YAML template variables.

ultrons · 2026-04-29T22:58:23Z

Update — review feedback addressed (commit `4e284ca`)

Thanks for the thorough review @jamOne-! Here's the round-trip on each comment.

Renamings (response to "What is PoC and why...")

PoC was an internal project name and shouldn't have leaked into upstream code. Fully de-PoC-ified throughout:

Before	After
`core/poc_discovery.py`	`core/quota_discovery.py`
`_resolve_poc_team`	`_resolve_quota_team`
`_load_poc_cfg` / `_poc_cfg`	`_load_quota_cfg` / `_quota_cfg`
`fetch_poc_config`	`fetch_quota_config`
`_build_poc_labels` (+ pod-template variant)	`_build_team_labels` (+ pod-template variant)
YAML template `{poc_labels}` / `{poc_pod_template_labels}`	`{team_labels}` / `{team_pod_template_labels}`
`~/.xpk/poc-cache/`	`~/.xpk/quota-cache/`
All "PoC" references in parser help text, docstrings, log messages	neutral team-quota wording

Cluster-side ConfigMap name poc-team-config is kept as-is for now (renaming it is a chart-side change that needs separate coordination — happy to do that as a follow-up PR).

Structure / code quality

Tuple → dataclass (your :159 comment): resolve_team() now returns a @dataclass(frozen=True) TeamRouting{namespace, local_queue, priority_class}. Callers use attribute access.
workload_status → its own file (:1204 comment): moved to commands/workload_status.py. The parser imports from there. workload.py is back to non-status concerns. The decorative banner nits at :1149 and :1202 are gone with it.
derive_k8s_workload_name placement (kueue_manager :52): moved below the module-level constants block.

Justifications added (`:574`, `:852`)

The K8s name shortening exists because the super-slice admission controller has a 49-char hard limit on slice names of the form <namespace>-jobset-<name>-<kueue-hash>-slice-job-<replica-index>. After the framing overhead and namespace, only ~12 chars are left for the user-facing workload name on a typical team — so mlperf-conv-13 (14 chars) silently fails admission with an opaque K8s error. The shortening is a sharp-edge fix.

To make the behavior less surprising:

Inline comment now points at quota_discovery.max_k8s_workload_name_len for the budget math.
New --no-shorten-jobset-name opt-out flag.
The "workload → JobSet name" log line only prints when shortening actually triggered (no-op cases stay silent).

`workload status` vs `inspector` (`:1151`)

They serve different audiences:

xpk inspector = SRE debug dump. Runs ~15 commands collecting cluster-wide state (configmaps, all node pools, kueue/jobset controller deployments + logs, etc.). The --workload arg is largely a label hint; the dump is cluster-wide regardless.
xpk workload status = focused user-facing diagnosis. Answers "why is my job stuck?" in a 3-line plain-English answer (RUNNING / QUEUED / STUCK / FINISHED, plus team-quota numbers and a specific fix when an AdmissionCheck error or queue position is detected).

If you'd like I can rename to xpk workload diagnose to make the focused-diagnosis intent obvious, or fold the logic into xpk inspector --workload=X --diagnose-only to keep one surface. Let me know which direction you prefer.

Tests (response to global "write tests for your changes")

+23 new tests, all 502 pass locally + goldens regenerated:

core/quota_discovery_test.py (new, 17 tests): resolve_team for known/unknown/empty-map cases, available_teams / available_value_classes happy + missing-key paths, max_k8s_workload_name_len with cfg / defaults / partial overrides, suggest did-you-mean ranking, and fetch_quota_config with kubectl output mocked (success / failure / malformed JSON / cache-write side effect).
core/kueue_manager_test.py (extended, 6 tests): derive_k8s_workload_name determinism, max_len cap, ldap-prefix shape, distinct inputs producing distinct hashes, short-input still-hashed.

Files touched in `4e284ca`

File	Change
`core/poc_discovery.py` → `core/quota_discovery.py`	renamed (with `git mv`) + dataclass + cleaned docstring
`core/quota_discovery_test.py`	new, 17 tests
`core/local_cache.py`	docstring + cache dir name
`core/kueue_manager.py`	moved `derive_k8s_workload_name` below CONSTS
`core/kueue_manager_test.py`	+6 tests for `derive_k8s_workload_name`
`commands/workload.py`	rename + dataclass call sites + opt-out flag handling
`commands/workload_status.py`	new (split from `workload.py`)
`parser/workload.py`	import from new module + neutral help text + new flag
`recipes/Workload_create*.md`	regenerated goldens for renamed YAML template variables

- workload.py: drop unused run_command_for_value import; add explicit protected-access pylint pragmas for args._quota_cfg (using args as a namespace bag is intentional); hoist the TeamRouting import to module level instead of importing it inside _resolve_quota_team. - workload_status.py: add check=False to subprocess.run; drop unused re import + unused regex match; the deferred .workload import (kept to avoid circular import) gets an explicit import-outside-toplevel pragma. - local_cache.py: add check=False to both subprocess.run calls. - kueue_manager_test.py: hoist derive_k8s_workload_name to the grouped imports block and use rsplit(maxsplit=1) for the suffix-length checks. - quota_discovery_test.py: replace `== []` empty-list comparisons with `not <list>` to match pylint's use-implicit-booleaness preference. Local pylint on the touched files now scores 10.00/10. All 502 tests still pass. Goldens unchanged.

- Narrow json.loads return to dict in local_cache.read and quota_discovery.fetch_quota_config to satisfy strict no-any-return. - Tighten _load_quota_cfg cache-hit type check to isinstance(cached, dict). - Suppress 'Action has no attribute completer' on argcomplete .completer assignments in parser/workload.py (existing pattern from common.py).

Pyink split the prior single-line assignment across three lines, which moved the '# type: ignore[attr-defined]' off the actual attribute access and left mypy flagging both the original error and the now-misplaced ignore comment. Hoist the Action into a local so the assignment fits on one line.

jamOne- · 2026-04-30T10:45:26Z

      help='The name of the cluster to delete the job on.',
      required=True,
-  )
+  ).completer = _cluster_completer  # type: ignore[attr-defined]


Please do not by-pass the type-checker like this.

If it's needed add a coment above why.

But please prefer to resolve the issue so it does not introduce unexpected errors in the future.

Fair callout, point 2 isn't fully resolved by my change, Here is where things stand:

What's done in 7d77cc5 + 383b3a3:

Every # type: ignore[attr-defined] removed at the call sites.

New _set_completer(action, completer) helper centralises the dynamic-attribute assignment.

Helper docstring (expanded in 383b3a3) explicitly acknowledges that setattr is itself a bypass that mypy can't introspect, and explains why no purely-static fix is currently available: argcomplete monkey-patches argparse.Action.completer at runtime, and there is no upstream type stub that declares it.

Two options for actually resolving the bypass:

Wait for argcomplete to ship updated stubs declaring Action.completer. Out of our hands.

Shadow Action with a Protocol that declares completer: Any and cast at the helper boundary. This is a few extra lines but makes the "trust me, this attribute exists" contract explicit to mypy. I haven't done this because it'd diverge from the existing pattern at parser/common.py:71, which uses a different escape hatch (untyped add_argument returning Any), wanted to check before introducing a new pattern just for these new flags.

Happy to switch to (2) if you'd prefer it here, or to tackle it as a cross-cutting cleanup (consistent across both new code and parser/common.py) in a follow-up. Let me know which

jamOne- · 2026-04-30T10:48:45Z

Add argument types in those helper functions, otherwise they are hard to follow

Done in 7d77cc5 — every inner helper in workload_status.py is now fully annotated.

Note: this file is moving to its own PR (see your :1151 comment) so the annotated version lives on the workload-status-pr branch.

jamOne- · 2026-04-30T10:51:13Z

+# ---------------------------------------------------------------------------
+# derive_k8s_workload_name
+# ---------------------------------------------------------------------------


Please remove we do not use this kind of comments

Removed in 7d77cc5.

jamOne- · 2026-04-30T10:56:03Z

+
+  # Fill project from gcloud config if not provided.
+  if not getattr(args, 'project', None):
+    r = _subprocess.run(


Why not using run_command_for_value?

Makes sense! fix applied. This whole file is removed from PR #1180 per the :1151 split, so it's no longer in the diff here. The change is on the workload-status-pr branch (lines 57-62) and will land with the follow-up PR:

was: subprocess.run(['gcloud', 'config', 'get', 'project'], ...)
rc, out = run_command_for_value(
'gcloud config get project',
task='gcloud config get project',
quiet=True,
)
args.project = out.strip().splitlines()[-1] if rc == 0 and out else ''
--

jamOne- · 2026-04-30T10:57:15Z

+
+  # Imported lazily to avoid a circular import: commands/workload.py imports
+  # this module's parser, and we'd otherwise close the loop.
+  from .workload import _resolve_quota_team  # pylint: disable=import-outside-toplevel


Move the quota function to another file then. Do not disable linter checks...

Done in 7d77cc5. The two quota helpers _load_quota_cfg and _resolve_quota_team (now load_quota_cfg and resolve_team_for_args) moved from commands/workload.py to core/quota_discovery.py — their natural home. Both call sites import from there at module top: no import-outside-toplevel, no pylint disable. Side benefit: args._quota_cfg → args.quota_cfg drops the
matching protected-access disable too.

jamOne- · 2026-04-30T11:00:25Z

+      return f'{s // 60}m'
+    return f'{s // 3600}h{(s % 3600) // 60}m'
+
+  def _cond(wl, ctype):


Maybe we could use

xpk/src/xpk/core/kubectl_common.py

Line 38 in 20e3da5

def parse_kubernetes_status(

?

Done in 7d77cc5. workload_status._cond / _is_true now use kubectl_common.parse_kubernetes_status for typed condition access. Added a reason field to KubernetesCondition (with one new test in kubectl_common_test.py) so the typed access covers all four fields workload_status reads. (Also moving to the separate PR.)

jamOne- · 2026-04-30T11:02:28Z

+    # core.kueue_manager.derive_k8s_workload_name and
+    # core.quota_discovery.max_k8s_workload_name_len for the budget math.
+    # The user can disable shortening with --no-shorten-jobset-name.
+    if team_namespace:


Why only when team_namespace is used? Seems inconsistent to me.

Intentional. This PR is meant to be purely additive: workloads submitted without --team keep their original K8s name and don't go through any of the new code paths, so existing xpk users see zero behavior change. The shortening solves a problem specific to the team-routing path (because namespace prefix eats most of the super-slice charLimit budget), so it's gated on that path. Happy to extend it to non-team workloads in a follow-up PR if you'd prefer that — but doing so here would change naming behavior for every existing xpk user, which felt riskier than warranted for this PR.

jamOne- · 2026-04-30T11:04:37Z

+  # For PoC teams, inject the user's full display name as XPK_WORKLOAD_NAME so
+  # training scripts can reference it for GCS artifact paths regardless of the
+  # short K8s name xpk derives for the JobSet.
+  if getattr(args, 'team', None):
+    args.env.setdefault('XPK_WORKLOAD_NAME', args.workload)


@FIoannides I need your review on that one. IMO this decision can be breaking for some users.

Bucket 1 — small, mechanical fixes: - kueue_manager_test.py: drop the decorative banner comments. - workload_status.py: type-annotate every inner helper. - workload_status.py: replace direct subprocess.run for the gcloud-project lookup with run_command_for_value, matching the rest of the file. - workload_status.py: condition handling now goes through kubectl_common.parse_kubernetes_status (typed access for type/status/lastTransitionTime/message/reason). - parser/workload.py: drop every '# type: ignore[attr-defined]' on argcomplete .completer assignments. New _set_completer helper uses setattr so the assignment is explicit and the type-checker is not bypassed. Bucket 2 — refactor to eliminate the workload_status -> workload lazy import: - Move load_quota_cfg + resolve_team_for_args from commands/workload.py to core/quota_discovery.py (their natural home). resolve_team_for_args now returns Optional[TeamRouting]; callers handle the unset case according to their own semantics. - workload.py and workload_status.py both import from quota_discovery at module top — no more 'pylint: disable=import-outside-toplevel'. - Rename args._quota_cfg -> args.quota_cfg (drops the protected-access pylint disable too). Supporting change: - kubectl_common.KubernetesCondition gains a 'reason' field (parse_kubernetes_status populates it). Lets workload_status use the shared helper end-to-end without falling back to dict access. Verified locally: 502 tests pass, pyink/mypy/pylint clean (mypy 0 errors; pylint score unchanged from main), goldens unchanged.

…n PR Per review feedback (jamOne-, workload.py:1151): the workload status diagnostic is unrelated to the team-routing changes that motivate this PR and should land on its own. Removing the file and the parser hookup here — the workload_status work is preserved on a stacked branch and will be opened as a separate PR once AI-Hypercomputer#1180 merges.

Per review feedback (jamOne-, workload.py:119): with the variable/file renames done in round 1, the literal string "poc-team-config" was the last `poc-` reference left in upstream code. Two changes: - Default ConfigMap name is now `team-quota-config` (neutral). xpk's fetch_quota_config queries `kueue-system/team-quota-config` by default. - A new xpk config key, `team-configmap-name`, lets operators override the name without touching the cluster. Existing deployments running the legacy `poc-team-config` ConfigMap can stay on it via: xpk config set team-configmap-name poc-team-config Implementation follows the pattern jamOne- referenced (telemetry.py / config.py): the override is a key in DEFAULT_KEYS, read via `get_config().get(TEAM_CONFIGMAP_NAME_KEY)` with the default name as fallback. The error message in load_quota_cfg references both the resolved name and the override mechanism so users can self-serve when they hit a missing-ConfigMap error. Tests: +2 (default name used when no override; override is honored when set). 504 tests pass total.

Per review feedback (jamOne-, parser/workload.py:1). Adds a new section to docs/usage/workloads.md covering the team-routing flags introduced by this PR: --team, --value-class, --declared-duration-minutes, and --no-shorten-jobset-name. Explains the auto-shortening (super-slice charLimit) and the team-configmap-name override for operators with legacy ConfigMap names.

Per round-2 follow-up review: the prior docstring described the helper as 'avoiding a per-call-site type-ignore' but did not name what it actually is — setattr is itself a dynamic-attribute bypass that mypy cannot statically check. Updated the docstring to: - acknowledge setattr is a bypass, just centralised - explain why no purely-static fix is available without external changes (argcomplete monkey-patches Action.completer at runtime; no upstream stub declares it) - list the two options for resolving (upstream stubs / Protocol+cast) - offer to switch to Protocol+cast on request No code change — docstring only.

ultrons · 2026-04-30T21:57:03Z

Thanks for the round-2 review @jamOne-! Pushed dea20f7, 3a9ac68, e4e19a3, 383b3a3 (on top of 7d77cc5 from the previous round). Per-comment responses inline. All four CI checks green.

jamOne-

Gemini review below:

The reviewer agent has completed the review of the pull request. Here is the verdict:

Code Review Verdict

Status: 🔴 Changes Requested

Overall Impression: The introduction of team-based routing and a runtime configurable quota discovery mechanism is a fantastic architectural improvement, successfully decoupling xpk releases from cluster administration. The code is well-structured, well-tested, and provides robust caching for responsive autocompletion. However, there are a few edge cases related to input validation, logic constraints, and shell injection hygiene that must be addressed before merging.

Finding Summary

ID	Severity	Category	File/Location	Finding (Concise)
F1	🔴 Critical	Security	`src/xpk/core/quota_discovery.py` (fetch_quota_config)	The customizable `team-configmap-name` is directly interpolated into a shell command, creating a shell injection vulnerability.
F2	🟡 Important	Logic	`src/xpk/core/kueue_manager.py` (derive_k8s_workload_name)	`derive_k8s_workload_name` incorrectly uses negative slicing if `max_len < 5`, violating the max length constraint.
F3	🟡 Important	Validation	`src/xpk/core/quota_discovery.py` (load_quota_cfg)	The `--declared-duration-minutes` flag is not validated as required when `--team` is set, despite documentation stating otherwise.

Detailed Findings

F1 - Prevent shell injection in `kubectl` command execution

Location: src/xpk/core/quota_discovery.py around line 68 (fetch_quota_config)
Why it matters: The cm_name variable is retrieved from the user's config via get_config().get(TEAM_CONFIGMAP_NAME_KEY) and directly interpolated into the cmd string, which is executed via run_command_for_value. While the user controls their own config, this still permits arbitrary local code execution if a malformed or malicious ConfigMap name (e.g. test; rm -rf /) is set in the XPK config.
How to fix: Use shlex.quote to properly escape the ConfigMap name before string interpolation.

import shlex

def fetch_quota_config() -> dict | None:
  # ...
  cm_name = shlex.quote(_configmap_name())
  cmd = f'kubectl get configmap -n {CONFIG_CM_NAMESPACE} {cm_name} -o json'

F2 - Fix negative slicing bug in `derive_k8s_workload_name`

Location: src/xpk/core/kueue_manager.py around line 84
Why it matters: The function computes max_prefix = max_len - 5 and then slices the string using ldap[:max_prefix]. If max_len is less than 5 (which is possible if a cluster admin misconfigures the sliceName character limits or uses an exceptionally long namespace), max_prefix becomes negative. In Python, a negative slice index counts from the end of the string, which causes the function to return a string much longer than max_len, thus silently violating the super-slice Kubernetes character limits and causing downstream admission check failures.
How to fix: Ensure max_len is at least 5 before attempting to generate the prefix and hash, raising an explicit error if it's too small.

def derive_k8s_workload_name(display_name: str, max_len: int) -> str:
  # ...
  if max_len < 5:
    raise ValueError(f"Calculated max_len ({max_len}) is too small to accommodate the required short name format. Please check the 'sliceName' character limits in the team-quota ConfigMap.")
  # Reserve 5 chars for '-' + 4 hex digits
  max_prefix = max_len - 5
  ldap = display_name.split("-")[0]
  prefix = ldap[:max_prefix]
  hex4 = hashlib.sha256(display_name.encode()).hexdigest()[:4]
  return f"{prefix}-{hex4}"

F3 - Enforce `--declared-duration-minutes` when `--team` is specified

Location: src/xpk/core/quota_discovery.py inside load_quota_cfg
Why it matters: The argparse help text explicitly states that --declared-duration-minutes is "Required when --team is set." However, load_quota_cfg does not enforce this. If a user omits the flag, the workload might be submitted without the duration label, preventing the time-limit controller from stopping overrunning jobs or functioning correctly.
How to fix: Add explicit validation to ensure the duration value is provided whenever the team flag is utilized.

def load_quota_cfg(args) -> dict | None:
  # ...
  if args.team not in (cfg.get('teams') or {}):
    # ... existing team not found error handling ...
    xpk_exit(1)
    
  if getattr(args, 'declared_duration_minutes', None) is None:
    xpk_print(
        'ERROR: --declared-duration-minutes is required when --team is set.'
    )
    xpk_exit(1)

Mechanical cleanups from jamOne-'s round-3 review: - docs/usage/workloads.md: drop the filler sentence above the flag table (the table is self-explanatory). - commands/workload.py: remove the empty-value comment above team_namespace init; consolidate the XPK_WORKLOAD_NAME env-var setdefault under the existing `if team_routing is not None:` branch instead of a separate getattr/isinstance check earlier in the function. - core/local_cache.py: route current_context() and gke_contexts_from_kubeconfig() through run_command_for_value (with quiet=True so tab completion isn't corrupted by xpk_print). Removes the last raw subprocess.run sites in this file, matching the codebase's command-helper pattern. - parser/workload.py: add explicit argument types to _cluster_completer and the inner _complete in _cached_field_completer.

Round-3 review AI-Hypercomputer#6/AI-Hypercomputer#7. Match parser/common.py:71's pattern: each add_argument(...).completer = X is now a single chain expression at the call site instead of going through a helper that wrapped setattr. Mypy stays clean by relaxing the parameter type on the two enclosing parser-setup functions (set_workload_delete_parser and set_workload_list_parser) to match this file's existing convention in add_shared_workload_create_required_arguments — untyped parser param so Any.add_argument().completer = X works for mypy. Removes the _set_completer wrapper, its docstring, and the typing.Any need at that boundary.

…uting recipe Round-3 review AI-Hypercomputer#2. Lets reviewers (and downstream users) exercise the team-routing path without deploying the cluster-side ConfigMap, and gives the PR a concrete entry point that shows the rendered JobSet with the team's namespace, priorityClass, and Kueue/team labels. Mirrors the existing DRY_RUN_RESERVATION_SUB_BLOCKS env-var pattern in core/reservation.py (modeled on reviewer's pointer). When set, the env var's JSON payload is treated as the contents of data["config.json"] from the live team-quota ConfigMap; otherwise xpk falls through to the live `kubectl get configmap` path (so live behavior is unchanged when the env var is unset). The recipe golden-tests via tools/recipes.py and produces a 190-line xpk dry-run output that highlights the team-routing fields (namespace, priorityClass, queue-name, team/value-class/declared-duration-minutes labels, the workload-name shortening for super-slice charLimit, and the XPK_WORKLOAD_NAME env injection).

ultrons · 2026-05-04T17:33:27Z

Pushed three commits addressing round-3 on top of 383b3a3:

9bd7cfe — nits: Xpk update #1, Xpk delete maxtext #3, Remove outdated next steps #4, Xpk merge from maxtext final #5, Test #8
393a33e — Add general files like pylint, gitignore, codeowners #6/Xpk system files #7: dropped _set_completer, inlined .completer = X matching parser/common.py:71
05f4d46 — update directories to be correct to run xpk #2: XPK_TEAM_QUOTA_DRY_RUN_CONFIG env var + recipes/Workload_create_with_team_routing.md

Local CI: mypy, pyink, pylint, pytest (504), and tools/recipes.py golden all green. Ready for re-review.

…#1180 review 4219697499) Three findings from the automated Gemini review jamOne- ran on top of round-3 (review 4219697499): F1 — shell injection in fetch_quota_config. cm_name comes from xpk config (xpk config set team-configmap-name <name>) and was interpolated into a shell command run via run_command_for_value (shell=True). A malicious or malformed config value (e.g. 'name; rm -rf /') would have been executed verbatim. Shell-quote with shlex.quote. F2 — negative-slicing bug in derive_k8s_workload_name. With max_len < 5, max_prefix = max_len - 5 went negative, ldap[:max_prefix] removed characters from the end of the prefix instead, and the resulting {prefix}-{hex4} silently exceeded max_len. Triggers when the team-quota ConfigMap's sliceName.charLimit is set too tightly relative to a long namespace. Now raises ValueError with a clear message pointing at the ConfigMap field, so the misconfiguration is caught up-front instead of at admission time. F3 — --declared-duration-minutes not enforced when --team is set. The flag's help text says "Required when --team is set"; load_quota_cfg didn't actually check, so a workload could be submitted with no duration label and the cluster-side time-limit controller would have nothing to gate on. Validation now matches the help text. Tests: +4 new (shell-injection-via-quoted-cm-name, derive raises for small max_len, declared-duration-minutes required/accepted with --team). 508 pass.

ultrons · 2026-05-04T18:09:44Z

Pushed 65c4ec4 addressing the three Gemini findings:

F1 (shell injection): shlex.quote on cm_name before kubectl interpolation in fetch_quota_config.
F2 (negative slicing): derive_k8s_workload_name now raises ValueError when max_len < 5, surfacing the ConfigMap misconfiguration up-front instead of silently emitting a too-long name.
F3 (required-flag enforcement): load_quota_cfg now validates --declared-duration-minutes when --team is set, matching the help text.

+4 unit tests covering each case. mypy, pyink, pylint, pytest (508), and tools/recipes.py golden all ok locally.

Two CI-only failures from the previous push: 1. linter (W0404/C0415): test_derive_k8s_workload_name_raises_when_max_len_too_small redundantly imported pytest inside the function. pytest is already at module level (line 19); use that. 2. verify-goldens: the recipe captured my local working directory (/home/sivaibhav_google_com/xpk-fork) into the dry-run output's "Adding ... to container image archive" line, which CI prints as /home/runner/work/xpk/xpk. Add --script-dir=/tmp to the recipe command to normalize the path, matching the convention in Workload_create_super-slicing.md and others. Regenerated the golden. Verified locally with the same env CI uses (XPK_VERSION_OVERRIDE=v0.0.0 only, no TELEMETRY_TRASH_EXECUTION).

jamOne-

Gemini review below. Please resolve F1 and F2.

Here is the review of the pull request #1180:

Code Review Verdict

Status: 🔴 Changes Requested

Overall Impression: The PR introduces a robust and flexible team-based Kueue routing mechanism via ConfigMaps and implements an elegant local caching strategy for sub-second CLI tab completions. The codebase is well-structured and follows clean abstraction principles. However, there are critical logic errors in the JobSet name derivation algorithm that will cause the Kubernetes API to reject workloads under specific constraints, as well as a shell injection vulnerability when executing commands with user-provided config values.

Finding Summary

ID	Severity	Category	File/Location	Finding (Concise)
F1	🔴 Critical	Logic	`src/xpk/core/kueue_manager.py`	Negative string slicing in `derive_k8s_workload_name` produces invalid or excessively long names when `max_len` < 6.
F2	🟡 Important	Security	`src/xpk/core/quota_discovery.py`	Missing shell escaping (`shlex.quote`) for the ConfigMap name risks arbitrary command execution.
F3	🔵 Suggestion	Error Handling	`src/xpk/core/local_cache.py`	Overly broad exception suppression (`except Exception`) masks legitimate filesystem or permission errors.

Detailed Findings

F1 - Logic Error in Workload Name Derivation

Location: src/xpk/core/kueue_manager.py:86 (in derive_k8s_workload_name)
Why it matters: The slice length math relies on max_prefix = max_len - 5. If a team has a long namespace (e.g., > 18 characters), max_len can drop below 5, making max_prefix negative.
1. Negative Slicing Bug: In Python, ldap[:negative_number] removes characters from the end of the string rather than capping the front (e.g., "amandaliang"[:-1] becomes "amandalian"). This creates a name that is longer than max_len, violating the strict 49-character super-slice controller limit and causing loud admission failures.
2. RFC 1123 Violation: If max_len == 5, max_prefix == 0. ldap[:0] evaluates to "", resulting in a derived name like "-abcd". Kubernetes API enforces RFC 1123, which strictly requires object names to start with an alphanumeric character, not a hyphen.
How to fix: Ensure max_len is sufficiently large (at least 6) and safely handle edge cases before attempting to slice the string.

def derive_k8s_workload_name(display_name: str, max_len: int) -> str:
  if max_len < 6:
    raise ValueError(
        f"max_len ({max_len}) is too small to accommodate a valid "
        "Kubernetes name with the deterministic hash suffix. Check the "
        "namespace length and sliceName limits."
    )
  # Reserve 5 chars for '-' + 4 hex digits
  max_prefix = max_len - 5
  ldap = display_name.split("-")[0]
  prefix = ldap[:max_prefix]
  hex4 = hashlib.sha256(display_name.encode()).hexdigest()[:4]
  return f"{prefix}-{hex4}"

F2 - Command Injection Vulnerability via Missing Escaping

Location: src/xpk/core/quota_discovery.py:75 (in fetch_quota_config)
Why it matters: The ConfigMap name is resolved from local configuration (get_config().get(TEAM_CONFIGMAP_NAME_KEY)) and immediately interpolated into a kubectl command executed with shell=True (via run_command_for_value). If a user configures their team-configmap-name with shell metacharacters (e.g., xpk config set team-configmap-name "foo; rm -rf /"), it will execute arbitrary commands on their machine. While xpk relies heavily on shell=True, configuration-driven injection is an avoidable risk.
How to fix: Use shlex.quote() to safely escape the dynamically retrieved ConfigMap name.

import shlex

def fetch_quota_config() -> dict | None:
  # ...
  cm_name = shlex.quote(_configmap_name())
  cmd = f'kubectl get configmap -n {CONFIG_CM_NAMESPACE} {cm_name} -o json'
  rc, out = run_command_for_value(cmd, task=cmd, quiet=True)
  # ...

F3 - Overly Broad Exception Suppression in Local Cache

Location: src/xpk/core/local_cache.py (in write, read, all_contexts)
Why it matters: The cache operations wrap their execution in a broad except Exception: pass (or returning None). While making the cache "best effort" is a good architectural choice, silently catching all Exceptions suppresses logical bugs (like NameError, TypeError) alongside legitimate OSErrors (like full disks or permission issues).
How to fix: Narrow the exception scope to catch only expected runtime failure conditions, making debugging significantly easier if the code breaks in the future.

def read(context: str) -> dict | None:
  # ...
  try:
    p = _path_for(context)
    if not p.exists():
      return None
    payload = json.loads(p.read_text())
    return payload if isinstance(payload, dict) else None
  except (OSError, json.JSONDecodeError):
    return None

jamOne- · 2026-05-05T08:36:50Z

  workload_create_pathways_parser.set_defaults(func=workload_create_pathways)


-def set_workload_delete_parser(workload_delete_parser: ArgumentParser):


Oh I see it now why it worked for ParserOrArgumentGroup, it just types the object to Any after add_argument...

Sorry for this confusion, could you please revert removing the types and go back to the original solution (add mypy bypassing comments)? That was the cleanest approach of all that we have available.

scaliby

This change is too big to be reviewed and carries a significant risk. Please work on splitting it into small, incremental PR's that are easy to reason about.

scaliby · 2026-05-05T08:37:01Z

+            ' --priority. Valid teams are discovered at runtime from'
+            " the cluster's team-quota ConfigMap."
+        ),
+    ).completer = _cached_field_completer('teams')


Please move field caching here and below to a separate PR and make it consistent with other arguments. Otherwise drop this.

.completer = _cached_field_completer('teams')

scaliby · 2026-05-05T08:38:32Z

 from .validators import directory_path_type, name_type


+def _cached_field_completer(field: str) -> Callable[..., list[str]]:


please move this to a separate PR. This is not related to team-based routing

scaliby · 2026-05-05T08:39:31Z

+  cfg = _sample_cfg()
+  mocker.patch('xpk.core.quota_discovery.fetch_quota_config', return_value=cfg)
+  mocker.patch('xpk.core.quota_discovery.xpk_print')
+  mocker.patch('xpk.core.quota_discovery.xpk_exit', side_effect=SystemExit(1))


xpk_exit by default raises SystemExit. Is this mock necessary?

scaliby · 2026-05-05T08:39:43Z

+  must fail loud, not silently emit a workload missing the duration label."""
+  cfg = _sample_cfg()
+  mocker.patch('xpk.core.quota_discovery.fetch_quota_config', return_value=cfg)
+  mocker.patch('xpk.core.quota_discovery.xpk_print')


Why? I think it's fine to just print some stuff as a part of the test.

ultrons added 3 commits April 28, 2026 17:20

ultrons requested review from FIoannides, SikaGrr, jamOne-, scaliby and stony-tark as code owners April 28, 2026 17:34

jamOne- requested changes Apr 29, 2026

View reviewed changes

jamOne- self-assigned this Apr 29, 2026

jamOne- added the release-features features label Apr 29, 2026

ultrons force-pushed the pr-rebased branch from c8375ce to bb91a62 Compare April 29, 2026 23:04

ultrons added 3 commits April 29, 2026 23:18

Apply pyink formatting to changed files

c21257f

jamOne- requested changes Apr 30, 2026

View reviewed changes

jamOne- assigned FIoannides Apr 30, 2026

ultrons added 5 commits April 30, 2026 19:37

ultrons mentioned this pull request Apr 30, 2026

xpk: add xpk workload status for team-routed workload diagnostics #1181

Draft

jamOne- requested changes May 4, 2026

View reviewed changes

ultrons added 3 commits May 4, 2026 17:29

jamOne- requested changes May 5, 2026

View reviewed changes

scaliby requested changes May 5, 2026

View reviewed changes

		_SUPER_SLICING_WORKLOAD_NAME_LIMIT = 28


		def _load_poc_cfg(args) -> dict \| None:

	DEFAULT_KEYS = [
	CFG_BUCKET_KEY,
	CLUSTER_NAME_KEY,
	PROJECT_KEY,
	CLIENT_ID_KEY,
	SEND_TELEMETRY_KEY,
	ZONE_KEY,
	CUSTOM_BINARIES_PATH_KEY,
	]

		workload_create_pathways_parser.set_defaults(func=workload_create_pathways)


		def set_workload_delete_parser(workload_delete_parser: ArgumentParser):

		from .validators import directory_path_type, name_type


		def _cached_field_completer(field: str) -> Callable[..., list[str]]:

Conversation

ultrons commented Apr 28, 2026

Add team-based queue routing to xpk workload create

Summary

Why a ConfigMap, not a hardcoded dict

User-visible UX additions

Did-you-mean

Tab completion (via existing argcomplete integration)

Workload-name decoupling

Commits

Backward compatibility

Note on naming

Test plan

Out of scope

Uh oh!

jamOne- left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ultrons commented Apr 29, 2026

Uh oh!

ultrons commented Apr 29, 2026

Update — review feedback addressed (commit 4e284ca)

Renamings (response to "What is PoC and why...")

Structure / code quality

Justifications added (:574, :852)

workload status vs inspector (:1151)

Tests (response to global "write tests for your changes")

Files touched in 4e284ca

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ultrons Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Add team-based queue routing to `xpk workload create`

Tab completion (via existing `argcomplete` integration)

Update — review feedback addressed (commit `4e284ca`)

Justifications added (`:574`, `:852`)

`workload status` vs `inspector` (`:1151`)

Files touched in `4e284ca`

ultrons Apr 30, 2026 •

edited

Loading

F1 - Prevent shell injection in `kubectl` command execution

F2 - Fix negative slicing bug in `derive_k8s_workload_name`

F3 - Enforce `--declared-duration-minutes` when `--team` is specified