Skip to content

Saving artifacts to a folder instead of overwriting#6

Open
connerhughes-dg wants to merge 1 commit into
mainfrom
feat/folder-creation-for-artifacts
Open

Saving artifacts to a folder instead of overwriting#6
connerhughes-dg wants to merge 1 commit into
mainfrom
feat/folder-creation-for-artifacts

Conversation

@connerhughes-dg
Copy link
Copy Markdown

No description provided.

@entelligence-ai-pr-reviews
Copy link
Copy Markdown

EntelligenceAI PR Summary

Refactors Kubernetes AWS artifact handling to use per-deployment named subfolders and renames the helm values file, with supporting documentation, validation, and test updates.

  • paths.py: New artifacts_dir_for(name) function constructs KUBERNETES_AWS_ARTIFACTS_DIR/<name>/ paths
  • kubernetes_aws.py: Replaces hardcoded artifact dir with dynamic artifacts_dir_for(name); adds _prompt_for_artifact_folder() with overwrite confirmation logic
  • kubernetes_aws_workflow.py: Extracts CLUSTER_CONFIG_FILENAME and HELM_VALUES_FILENAME constants; renames my-values.yamlhelm-values.yaml
  • cli.py: --output-dir now optional, resolved dynamically from cluster name
  • wizard.py: Adds NAME_PATTERN, NAME_HINT, and validate_name() for alphanumeric cluster/folder name enforcement
  • README.md / kubernetes/aws/README.md: Docs updated for new folder structure, renamed file, and new wizard prompt
  • deployment.txt: New self-hosting reference with S3 model links and release prerequisites
  • tests/test_kubernetes_aws_workflow.py: Refactored tests for new structure; added 5 new test cases

Confidence Score: 2/5 - Changes Needed

Not safe to merge — this PR introduces two meaningful vulnerabilities: a path traversal risk via validate_name in wizard.py and an unhandled exception path in cli.py. The PR's goal of organizing artifacts into named subfolders via artifacts_dir_for(name) is well-conceived and the structural refactoring is clean, but the NAME_PATTERN regex ^[A-Za-z0-9._-]+$ permits .. as a valid deployment name, which would allow artifacts_dir_for('..') to escape the intended KUBERNETES_AWS_ARTIFACTS_DIR boundary entirely. Additionally, load_config(config) in cli.py sits outside the try/except (RuntimeError, ValueError) block, meaning errors from config loading will propagate as unhandled exceptions rather than being caught and surfaced cleanly to the user.

Key Findings:

  • In wizard.py, NAME_PATTERN = '^[A-Za-z0-9._-]+$' accepts .. and . as valid names; since artifacts_dir_for(name) constructs KUBERNETES_AWS_ARTIFACTS_DIR/<name>/ without sanitization, a user-supplied name of .. would resolve to a directory outside the intended artifacts root, enabling path traversal.
  • In cli.py, the output_dir derivation depends on load_config(config) which is called before the try/except (RuntimeError, ValueError) block begins, meaning any RuntimeError or ValueError raised during config loading bypasses the error handler and surfaces as an unhandled traceback to the user.
  • The core refactoring — introducing artifacts_dir_for(name) in paths.py and _prompt_for_artifact_folder() with overwrite confirmation in kubernetes_aws.py — is logically sound and improves artifact isolation per deployment, but these two issues must be addressed before the security and reliability guarantees hold.
Files requiring special attention
  • src/deepgram_self_hosted/wizard.py
  • src/deepgram_self_hosted/cli.py
  • src/deepgram_self_hosted/paths.py

Comment on lines +129 to +138
if output_dir is None:
cluster_name = str(
get_path(
load_config(config),
"cluster",
"name",
default="deepgram-self-hosted-cluster",
)
)
output_dir = artifacts_dir_for(cluster_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR BUG Move output_dir derivation inside the try block to catch load_config errors

load_config(config) at line 132 is outside the try/except (RuntimeError, ValueError) block. When --output-dir is omitted (the new default) and the config is not a YAML mapping, ValueError escapes unhandled as a raw traceback instead of printing [red]Could not render Kubernetes/AWS plan:[/red].

Suggested change
if output_dir is None:
cluster_name = str(
get_path(
load_config(config),
"cluster",
"name",
default="deepgram-self-hosted-cluster",
)
)
output_dir = artifacts_dir_for(cluster_name)
try:
if output_dir is None:
cluster_name = str(
get_path(
load_config(config),
"cluster",
"name",
default="deepgram-self-hosted-cluster",
)
)
output_dir = artifacts_dir_for(cluster_name)
Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In src/deepgram_self_hosted/cli.py, the `if output_dir is None:` block (lines 129–138) that calls `load_config(config)` sits BEFORE the `try` block at line 139. If the config file is not a valid YAML mapping, `load_config` raises `ValueError` which is not caught, producing a raw traceback instead of the friendly error message. Fix: move the entire `if output_dir is None:` block to be the FIRST statement inside the `try` block (i.e., indent it one level further so it sits between `try:` and the existing `cluster_config, values = ...` call). The `try` block already catches `ValueError`, so the error will then be handled correctly.

Comment on lines +19 to +28
# Used for cluster name and artifact folder name. A subset of AWS EKS naming
# (alphanumeric + `-`) that is also filename-safe on Linux/macOS/Windows.
NAME_PATTERN = re.compile(r"^[A-Za-z0-9._-]+$")
NAME_HINT = "Use letters, digits, '.', '-', or '_' only."


def validate_name(value: str) -> bool | str:
"""Questionary validator for cluster/folder names."""
stripped = value.strip()
if not stripped:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR SECURITY validate_name allows '..' enabling path traversal into artifacts_dir_for

wizard.py's NAME_PATTERN (^[A-Za-z0-9._-]+$) accepts .. and . as valid names. kubernetes_aws.py passes the validated name directly to artifacts_dir_for(name) (paths.py), so artifacts_dir_for('..') resolves to kubernetes/aws/ — one level above the intended artifacts root — silently writing cluster-config, helm-values, and session files there.

Suggested change
# Used for cluster name and artifact folder name. A subset of AWS EKS naming
# (alphanumeric + `-`) that is also filename-safe on Linux/macOS/Windows.
NAME_PATTERN = re.compile(r"^[A-Za-z0-9._-]+$")
NAME_HINT = "Use letters, digits, '.', '-', or '_' only."
def validate_name(value: str) -> bool | str:
"""Questionary validator for cluster/folder names."""
stripped = value.strip()
if not stripped:
NAME_PATTERN = re.compile(r"^[A-Za-z0-9][A-Za-z0-9_-]*(?:\.[A-Za-z0-9_-]+)*$")
Prompt to fix with AI

Copy this prompt into your AI coding assistant to fix this issue.

In `src/deepgram_self_hosted/wizard.py`, tighten `NAME_PATTERN` so that `.` and `..` (and names starting/ending with a dot) are rejected. A minimal fix: require the name to start and end with an alphanumeric character, e.g. `re.compile(r'^[A-Za-z0-9][A-Za-z0-9._-]*[A-Za-z0-9]$|^[A-Za-z0-9]$')`. Add a test in `test_validate_name_rejects_unsafe_characters` for `validate_name('..')` and `validate_name('.')`. The validator is consumed by `_prompt_for_artifact_folder` in `kubernetes_aws.py` which passes the result straight to `artifacts_dir_for(name)` with no further sanitization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants