Skip to content

feat: add deploy-kernel action and Dockerfile validation#22

Open
rryoung98 wants to merge 15 commits intomainfrom
feature/custom-kernel-deployment
Open

feat: add deploy-kernel action and Dockerfile validation#22
rryoung98 wants to merge 15 commits intomainfrom
feature/custom-kernel-deployment

Conversation

@rryoung98
Copy link
Copy Markdown
Member

Summary

  • Add deploy-kernel/ composite action for deploying custom Jupyter kernel images to qBraid
  • Add src/scripts/validate_dockerfile.py for static Dockerfile validation
  • Add kernel catalog validation to validate_course.py (checks kernelName exists in catalog)

Usage

Referenced as qBraid/upload-course-action/deploy-kernel@main in workflows.

Inputs: api-key, dockerfile-path, kernel-name, language, display-name, context-dir

Test plan

  • Merge to main
  • Run deploy-kernel workflow from qbraid-tutorial
  • Verify kernel appears in catalog at qbook-staging.k8s.qbraid.com/api/kernelspecs

🤖 Generated with Claude Code

rryoung98 and others added 6 commits March 23, 2026 14:02
Static validation of kernel Dockerfiles before submission to API.
Checks: required qbraid.kernel.* LABELs, EXPOSE 8888, CMD/ENTRYPOINT
referencing jupyter kernelgateway, no root USER, no privileged flags,
kernel.json presence, and kernel name format validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Course validator now fetches the kernel catalog from
qbook-staging.k8s.qbraid.com/api/kernelspecs and verifies that
every kernelName in chapters/sections exists. Gracefully skips
validation if catalog is unreachable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New composite action at deploy-kernel/ for deploying custom Jupyter
kernel images to qBraid. Called as qBraid/upload-course-action/deploy-kernel@main.

Inputs: api-key, dockerfile-path, kernel-name, language, display-name,
context-dir, api-base-url.

Flow: validate Dockerfile structure -> base64 encode + submit to API ->
poll Cloud Build status until active/failed.

Outputs: kernel-name, image-uri, status, build-id.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rryoung98 rryoung98 marked this pull request as ready for review April 8, 2026 19:02
@rryoung98 rryoung98 marked this pull request as draft April 8, 2026 19:09
@rryoung98 rryoung98 marked this pull request as ready for review April 8, 2026 19:19
rryoung98 and others added 3 commits April 8, 2026 15:13
Lint job was failing on a stray blank line between the import block
and the module-level constants. Auto-fixed via isort.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new deploy-kernel/ composite GitHub Action for deploying custom Jupyter kernel images to qBraid, alongside a shared static Dockerfile validator and additional course validation to ensure referenced kernels exist in a catalog.

Changes:

  • Added a shared validate_dockerfile.py implementation plus a deploy-kernel wrapper that delegates to it.
  • Added deploy-kernel composite action + deployment script with polling and GitHub Actions outputs.
  • Extended validate_course.py to validate kernelName references against a kernel catalog, and updated tests/examples accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/scripts/validate_dockerfile.py Adds shared static Dockerfile validation logic used by the action.
deploy-kernel/src/scripts/validate_dockerfile.py Wrapper that reuses the shared validator from src/scripts.
deploy-kernel/src/scripts/deploy_kernel.py Implements kernel deploy API call + polling and writes action outputs.
deploy-kernel/action.yml Defines the composite action steps (setup python, install deps, validate, deploy).
deploy-kernel/requirements.txt Declares Python deps for the deploy-kernel action runtime.
src/scripts/validate_course.py Adds kernel catalog lookup to validate kernelName references.
test/unit/test_validate_dockerfile.py New unit tests covering Dockerfile validator behavior and helpers.
test/unit/test_deploy_kernel.py New unit tests for deploy script API handling, polling, and outputs.
test/unit/test_deploy_kernel_validate_dockerfile_wrapper.py Verifies wrapper validator matches shared validator behavior.
test/unit/test_validate_course.py Updates course fixture values to align with kernel catalog validation.
test/e2e/test_action_flow.py Updates kernel IDs/names used in end-to-end action flow test data.
test/e2e/test_nasty_inputs.py Updates kernel IDs/names used in nasty-input E2E test data.
test/conftest.py Updates shared course fixture kernel name/id used across tests.
examples/Dockerfile.kernel Adds an example Dockerfile for custom kernel deployment.
examples/requirements.kernel.txt Adds example requirements file for kernel image builds.
examples/deploy-kernel-workflow.yml Adds example workflow demonstrating how to call the new action.
CHANGELOG.md Notes new behavior/fixes related to kernel deployment and validator reuse.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rryoung98
Copy link
Copy Markdown
Member Author

Code Review: PR #22 — Deploy Kernel GitHub Action

CRITICAL

C1. API key visible in process listing and potentially in logs
deploy-kernel/action.yml:73 — The API key is passed as a command-line argument: --api-key "${{ inputs.api-key }}". On Linux, all CLI args are visible via /proc/<pid>/cmdline. If the step fails and GitHub prints the failing command, the key may be logged. Composite action inputs are NOT automatically masked by GitHub Actions (only secrets.* at the workflow level).

Fix: Pass via environment variable instead:

env:
  QBRAID_API_KEY: ${{ inputs.api-key }}

Have the script read from os.environ["QBRAID_API_KEY"], and add ::add-mask:: in the action.

C2. write_github_output is vulnerable to output injection
deploy-kernel/src/scripts/deploy_kernel.py:38 — Uses simple {key}={value}\n format. If value contains a newline (e.g., from an API error message), subsequent lines are parsed as separate key=value pairs by the GHA runner. The existing common.py:write_github_output already handles multiline safely with delimiter syntax, but this new copy does not.

Fix: Reuse common.write_github_output or replicate its delimiter-based multiline handling.


HIGH

H1. _collect_context_files uploads all files including potential secrets
deploy_kernel.py:85-94 — Only .gitignore is excluded. Files like .env, credentials.json, *.pem, SSH keys will be base64-encoded and sent to the API. The default context-dir is . (repo root), so a misconfigured workflow uploads the entire checkout.

Fix: Add a deny-list (.env, *.key, *.pem, .git/, credentials*) or require an explicit allowlist of extensions.

H2. No input validation on kernel-name or language
deploy_kernel.py:99-107 — Any string is accepted and POSTed directly to the API. A user could pass kernel-name: "../../etc".

Fix: Validate kernel_name against ^[a-z][a-z0-9_]{2,63}$ and language against the supported list at the start of deploy_kernel().

H3. Dockerfile labels and action inputs can diverge silently
action.yml:62-79 — The Dockerfile is validated for correct LABELs, but the deploy step sends --kernel-name, --language, --display-name from action inputs. The validated labels may not match what's actually deployed, giving false validation confidence.

Fix: Cross-validate that action inputs match Dockerfile labels, or extract metadata from the labels.

H4. _has_kernel_json_copy misses ipykernel install pattern
validate_dockerfile.py:86-103 — Checks for the string "kernel.json" in COPY/ADD/RUN lines. But python -m ipykernel install --name my_kernel creates kernel.json implicitly without the string appearing. Users who only use ipykernel install get a confusing validation error.

Fix: Add "ipykernel" in stripped.lower() as a condition in the RUN check.


MEDIUM

M1. Polling sleeps 30s before the first check
deploy_kernel.py:177time.sleep(POLL_INTERVAL_SECONDS) runs at the top of the loop. For builds that complete quickly or the KERNEL_ALREADY_EXISTS edge case, this adds unnecessary latency.

Fix: Move sleep to after the poll response check.

M2. Hardcoded 30-minute timeout with no user override
deploy_kernel.py:24-25MAX_POLL_ATTEMPTS=60 * POLL_INTERVAL_SECONDS=30 = 1800s. No action input to adjust. Stuck builds burn Actions runner time. No backoff on consecutive server errors.

Fix: Add an optional timeout-minutes input. Consider exponential backoff.

M3. _parse_labels test gives false confidence for continuation lines
test at diff:1937-1939test_parse_labels_with_backslash_continuation passes raw lines (not pre-joined), so it only verifies key1 is found on the first line. The continuation isn't actually tested.

Fix: Test with pre-joined lines (matching real usage) and assert both keys.

M4. repo_root = Path(__file__).resolve().parents[3] is fragile
validate_dockerfile.py:11 — Assumes fixed directory depth. If files are moved or symlinked, this breaks silently at import time.


LOW

  • Missing newline at end of examples/Dockerfile.kernel
  • Inconsistent tuple[...] (3.9+) vs from typing import Dict import style
  • Example workflow pins to @kernel-deployment-v0.0-beta.0.1 — tag likely doesn't exist yet
  • No FROM instruction validation — a Dockerfile without FROM passes through to Docker build

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +288 to +307
try:
poll_resp = requests.get(
poll_url, headers=headers, timeout=REQUEST_TIMEOUT_SECONDS
)
except requests.RequestException as e:
logger.warning(f"Poll request failed: {e}")
continue

if poll_resp.status_code != 200:
error_code, error_message = _parse_error_response(poll_resp)
log_message = (
f"Poll returned {poll_resp.status_code}"
f"{f' ({error_code})' if error_code else ''}: {error_message}"
)
if poll_resp.status_code in FATAL_POLL_STATUS_CODES:
logger.error(log_message)
write_github_output("status", "failed")
sys.exit(1)
logger.warning(log_message)
continue
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling loop does continue on request exceptions and non-fatal non-200 responses without sleeping, so it can rapidly hammer the API and burn through MAX_POLL_ATTEMPTS almost instantly. Consider ensuring time.sleep(POLL_INTERVAL_SECONDS) (or a backoff) happens between attempts even on transient failures.

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +286
# Poll for completion
poll_url = f"{api_base_url}/kernels/deploy/{build_id}"
deadline = time.monotonic() + timeout_seconds
for attempt in range(1, MAX_POLL_ATTEMPTS + 1):
logger.info(f"Polling build status (attempt {attempt}/{MAX_POLL_ATTEMPTS})...")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout_seconds is used to compute deadline, but the polling loop is also capped by MAX_POLL_ATTEMPTS. If a caller sets timeout-seconds > (MAX_POLL_ATTEMPTS * POLL_INTERVAL_SECONDS), the action will still time out early, making the input misleading. Consider deriving max attempts from timeout_seconds (or looping until deadline only).

Copilot uses AI. Check for mistakes.
shell: bash
env:
QBRAID_API_KEY: ${{ inputs.api-key }}
QBRAID_DEPLOY_TIMEOUT_SECONDS: ${{ inputs.timeout-seconds }}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The composite action exports QBRAID_DEPLOY_TIMEOUT_SECONDS, but deploy_kernel.py doesn't read this environment variable (timeout is passed via --timeout-seconds). This extra env var can confuse users—either remove it or wire the script to honor it when the CLI arg isn't provided.

Suggested change
QBRAID_DEPLOY_TIMEOUT_SECONDS: ${{ inputs.timeout-seconds }}

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +44
- name: Deploy kernel to qBraid
id: deploy-kernel
uses: qBraid/upload-course-action/deploy-kernel@kernel-deployment-v0.0-beta.0.1 # ✅ repo/subdir@tag
with:
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says workflows should reference qBraid/upload-course-action/deploy-kernel@main, but this example pins a pre-release tag. Please align the example with the intended guidance (either update the example ref or adjust the PR description).

Copilot uses AI. Check for mistakes.

# 1. Check required LABEL directives
labels = _parse_labels(lines)
for required_label in REQUIRED_LABELS:
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUIRED_LABELS is a frozenset, so iterating it can produce a non-deterministic error ordering across runs/Python versions. For more stable logs (and easier testing/debugging), consider iterating over sorted(REQUIRED_LABELS) when emitting errors.

Suggested change
for required_label in REQUIRED_LABELS:
for required_label in sorted(REQUIRED_LABELS):

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants