refactor(gcloud): Introduce a test framework for Critical User Journeys #176

cjac · 2025-06-23T21:29:18Z

refactor(gcloud): Introduce a test framework for Critical User Journeys

This commit refactors the project's testing and automation scripts, moving them from a monolithic bin/ directory into a structured framework centered around Critical User Journeys (CUJs). The goal is to create a more maintainable and scalable way to exercise the use cases that are essential for our customers.

This new structure is designed for better organization, code reuse, and CI/CD integration.

Key changes include:

CUJ Directory Structure:
- gcloud/cuj/: Contains individual journey scripts. The first CUJ, standard-cluster-management, is introduced to handle the creation, deletion, and rebuilding of a standard Dataproc cluster.
- gcloud/onboarding/: Holds scripts for setting up shared, prerequisite infrastructure (e.g., GCS buckets) that CUJs depend on.
- gcloud/lib/: A new directory for shared bash libraries. common.sh is introduced to centralize functions for configuration loading, network management, and logging.
- gcloud/ci/: Includes scripts specifically for CI/CD pipelines.
CI Environment Management:
- A new pristine_check.sh script is added to ensure the GCP project is in a clean state before a test run. It can operate in a strict validation mode (fail if resources exist) or a cleanup mode (actively delete resources).
Configuration and Code Improvements:
- Configuration is centralized and simplified in env.json, which is now loaded by the load_config function in common.sh.
- Scripts are refactored to be more robust, with clear up, down, and validate commands, and checks for pre-existing resources.
- The .gitignore is updated to exclude temporary files and local configuration.

This change lays the foundation for adding more complex user journeys, ensuring our core services are consistently tested from a customer perspective.

Resolves: GoogleCloudDataproc/initialization-actions#1281
Relates: go/sf/59134938

This commit refactors the project's testing and automation scripts, moving them from a monolithic `bin/` directory into a structured framework centered around Critical User Journeys (CUJs). The goal is to create a more maintainable and scalable way to exercise the use cases that are essential for our customers. This new structure is designed for better organization, code reuse, and CI/CD integration. Key changes include: * **CUJ Directory Structure:** * `gcloud/cuj/`: Contains individual journey scripts. The first CUJ, `standard-cluster-management`, is introduced to handle the creation, deletion, and rebuilding of a standard Dataproc cluster. * `gcloud/onboarding/`: Holds scripts for setting up shared, prerequisite infrastructure (e.g., GCS buckets) that CUJs depend on. * `gcloud/lib/`: A new directory for shared bash libraries. `common.sh` is introduced to centralize functions for configuration loading, network management, and logging. * `gcloud/ci/`: Includes scripts specifically for CI/CD pipelines. * **CI Environment Management:** * A new `pristine_check.sh` script is added to ensure the GCP project is in a clean state before a test run. It can operate in a strict validation mode (fail if resources exist) or a cleanup mode (actively delete resources). * **Configuration and Code Improvements:** * Configuration is centralized and simplified in `env.json`, which is now loaded by the `load_config` function in `common.sh`. * Scripts are refactored to be more robust, with clear `up`, `down`, and `validate` commands, and checks for pre-existing resources. * The `.gitignore` is updated to exclude temporary files and local configuration. This change lays the foundation for adding more complex user journeys, ensuring our core services are consistently tested from a customer perspective.

gemini-code-assist

Summary of Changes

Hello @cjac, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the project's testing and automation infrastructure by transitioning from a monolithic bin/ directory to a structured framework based on Critical User Journeys (CUJs). This change aims to enhance the maintainability, scalability, and reusability of scripts, ensuring robust and consistent testing of core customer use cases within CI/CD pipelines.

Highlights

New Directory Structure for CUJs: Introduced gcloud/cuj/, gcloud/onboarding/, gcloud/lib/, and gcloud/ci/ to organize testing and automation scripts around Critical User Journeys, promoting modularity and reuse.
CI Environment Management: Added gcloud/ci/pristine_check.sh to validate and clean up GCP project resources, ensuring a pristine environment before test runs, with both strict validation and cleanup modes.
Centralized Configuration and Shared Utilities: Refactored configuration to env.json and introduced gcloud/lib/common.sh to centralize functions for configuration loading, network management (creation/deletion), and logging, improving script robustness and maintainability.
First Critical User Journey (CUJ): Implemented the gcloud/cuj/standard-cluster-management/manage.sh script, providing a comprehensive lifecycle management (create, delete, rebuild) for a standard Dataproc cluster, serving as a template for future CUJs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The code changes introduce a well-structured testing framework centered around Critical User Journeys (CUJs). The new structure and the use of common.sh and env.json are major improvements. The review focuses on improving the robustness and security of the new scripts, including idempotency in resource creation and a firewall rule.

gemini-code-assist · 2025-06-23T21:31:46Z

gcloud/lib/common.sh

+function create_network_and_subnet() {
+  # ... (logic to check if network exists)
+  gcloud compute networks create "${CONFIG[NETWORK]}" \
+      --subnet-mode=custom \
+      --description="Network for CUJ testing" \
+      --bgp-routing-mode="regional"
+
+  # ... (logic to check if subnet exists)
+  gcloud compute networks subnets create "${CONFIG[SUBNET]}" \
+       --network="${CONFIG[NETWORK]}" \
+       --range="${CONFIG[SUBNET_CIDR]}" \
+       --region="${CONFIG[REGION]}"
+
+  # Add firewall rule with the tag to allow SSH
+  gcloud compute firewall-rules create "${CONFIG[CUJ_TAG]}-allow-ssh" \
+    --network="${CONFIG[NETWORK]}" \
+    --allow=tcp:22 \
+    --source-ranges="0.0.0.0/0" \
+    --description="Allow SSH for CUJ test" \
+    --target-tags="${CONFIG[CUJ_TAG]}"
+}


The create_network_and_subnet function is not idempotent. If the network, subnet, or firewall rule already exists, the gcloud ... create commands will fail. Check for the existence of each resource before attempting to create it.

function create_network_and_subnet() { header "Creating Network and Subnet" # Create network if it doesn't exist if ! gcloud compute networks describe "${CONFIG[NETWORK]}" &>/dev/null; then echo "Creating network '${CONFIG[NETWORK]}'..." gcloud compute networks create "${CONFIG[NETWORK]}" \ --subnet-mode=custom \ --description="Network for CUJ testing" \ --bgp-routing-mode="regional" else echo "Network '${CONFIG[NETWORK]}' already exists." fi # Create subnet if it doesn't exist if ! gcloud compute networks subnets describe "${CONFIG[SUBNET]}" --region="${CONFIG[REGION]}" &>/dev/null; then echo "Creating subnet '${CONFIG[SUBNET]}'..." gcloud compute networks subnets create "${CONFIG[SUBNET]}" \ --network="${CONFIG[NETWORK]}" \ --range="${CONFIG[SUBNET_CIDR]}" \ --region="${CONFIG[REGION]}" else echo "Subnet '${CONFIG[SUBNET]}' already exists." fi # Create firewall rule if it doesn't exist local firewall_rule_name="${CONFIG[CUJ_TAG]}-allow-ssh" if ! gcloud compute firewall-rules describe "${firewall_rule_name}" &>/dev/null; then echo "Creating firewall rule '${firewall_rule_name}'..." gcloud compute firewall-rules create "${firewall_rule_name}" \ --network="${CONFIG[NETWORK]}" \ --allow=tcp:22 \ --source-ranges="0.0.0.0/0" \ --description="Allow SSH for CUJ test" \ --target-tags="${CONFIG[CUJ_TAG]}" else echo "Firewall rule '${firewall_rule_name}' already exists." fi }

gemini-code-assist · 2025-06-23T21:31:46Z

gcloud/lib/common.sh

+  gcloud compute firewall-rules create "${CONFIG[CUJ_TAG]}-allow-ssh" \
+    --network="${CONFIG[NETWORK]}" \
+    --allow=tcp:22 \
+    --source-ranges="0.0.0.0/0" \


Opening SSH (tcp:22) to the entire internet (0.0.0.0/0) is a significant security risk. Restrict the source-ranges to a safer alternative like the IP range for Google Cloud's Identity-Aware Proxy (IAP) for TCP forwarding (35.235.240.0/20).

Suggested change

--source-ranges="0.0.0.0/0" \

--source-ranges="35.235.240.0/20" \

gemini-code-assist · 2025-06-23T21:31:46Z

gcloud/ci/pristine_check.sh

+  echo "Found leftover Dataproc clusters: ${CLUSTERS}" | tee -a "${LEFTOVERS_FILE}"
+  if [[ "$STRICT_MODE" == false ]]; then
+    echo "Cleaning up..."
+    # Run deletions in the background for speed.
+    for cluster in ${CLUSTERS}; do
+      gcloud dataproc clusters delete --quiet "${cluster}" --region="${CONFIG[REGION]}" &
+    done
+  fi


Consider using a while read loop for iterating over lines of command output to handle resource names with spaces or special characters. Also, pipe the CLUSTERS variable directly to tee to preserve line breaks from the gcloud command's output.

Suggested change

echo "Found leftover Dataproc clusters: ${CLUSTERS}" | tee -a "${LEFTOVERS_FILE}"

if [[ "$STRICT_MODE" == false ]]; then

echo "Cleaning up..."

# Run deletions in the background for speed.

for cluster in ${CLUSTERS}; do

gcloud dataproc clusters delete --quiet "${cluster}" --region="${CONFIG[REGION]}" &

done

fi

echo "Found leftover Dataproc clusters:" | tee -a "${LEFTOVERS_FILE}"

echo "${CLUSTERS}" | tee -a "${LEFTOVERS_FILE}"

if [[ "$STRICT_MODE" == false ]]; then

echo "Cleaning up..."

# Use a while-read loop for robustness.

echo "${CLUSTERS}" | while IFS= read -r cluster; do

if [[ -n "${cluster}" ]]; then

gcloud dataproc clusters delete --quiet "${cluster}" --region="${CONFIG[REGION]}" &

fi

done

fi

gemini-code-assist · 2025-06-23T21:31:46Z

gcloud/cuj/standard-cluster-management/manage.sh

+    # ---FIX---
+    # Add defensive check for required configuration.
+    if [[ -z "${CONFIG[CLUSTER_NAME]}" || -z "${CONFIG[REGION]}" || -z "${CONFIG[SUBNET]}" || -z "${CONFIG[CUJ_TAG]}" ]]; then
+      echo "ERROR: One or more required keys (CLUSTER_NAME, REGION, SUBNET, CUJ_TAG) are missing from env.json" >&2
+      exit 1
+    fi
+    # ---END FIX---


The configuration validation logic is repeated in the delete_cluster function (lines 42-45). Extract this logic into a shared helper function to improve maintainability and reduce code duplication.

Suggested change

# ---FIX---

# Add defensive check for required configuration.

if [[ -z "${CONFIG[CLUSTER_NAME]}" || -z "${CONFIG[REGION]}" || -z "${CONFIG[SUBNET]}" || -z "${CONFIG[CUJ_TAG]}" ]]; then

echo "ERROR: One or more required keys (CLUSTER_NAME, REGION, SUBNET, CUJ_TAG) are missing from env.json" >&2

exit 1

fi

# ---END FIX---

function require_config_keys() {

local missing_keys=()

for key in "$@"; do

if [[ -z "${CONFIG[$key]}" ]]; then

missing_keys+=("$key")

fi

done

if [[ ${#missing_keys[@]} -gt 0 ]]; then

( # Use subshell to localize IFS change

IFS=,

echo "ERROR: One or more required keys (${missing_keys[*]}) are missing from env.json" >&2

)

exit 1

fi

}

cjac · 2025-06-24T00:58:33Z

This PR cannot resolve GoogleCloudDataproc/initialization-actions#1281 until it implements the proxy-only CUJ.

This commit refactors the project's testing and automation scripts, moving them from a monolithic `bin/` directory into a structured framework centered around Critical User Journeys (CUJs). The goal is to create a more maintainable and scalable way to exercise the use cases that are essential for our customers. This new structure is designed for better organization, code reuse, and CI/CD integration. Key changes include: * **CUJ Directory Structure:** * `gcloud/cuj/`: Contains individual journey scripts. The first CUJ, `gce/standard`, is introduced to handle the creation, deletion, and rebuilding of a standard Dataproc cluster. * `gcloud/onboarding/`: Holds scripts for setting up shared, prerequisite infrastructure (e.g., GCS buckets) that CUJs depend on. * `gcloud/lib/`: A new directory for shared bash libraries. `common.sh` is introduced to centralize functions for configuration loading, network management, and logging. * `gcloud/ci/`: Includes scripts specifically for CI/CD pipelines. * **CI Environment Management:** * A new `pristine_check.sh` script is added to ensure the GCP project is in a clean state before a test run. It can operate in a strict validation mode (fail if resources exist) or a cleanup mode (actively delete resources). * **Configuration and Code Improvements:** * Configuration is centralized and simplified in `env.json`, which is now loaded by the `load_config` function in `common.sh`. * Scripts are refactored to be more robust, with clear `up`, `down`, and `validate` commands, and checks for pre-existing resources. * The `.gitignore` is updated to exclude temporary files and local configuration. This change lays the foundation for adding more complex user journeys, ensuring our core services are consistently tested from a customer perspective.

This commit builds upon the foundational CUJ framework by ingesting battle-tested logic from numerous sources and implementing the initial set of comprehensive, production-like Critical User Journeys. The framework is now enhanced with a powerful, modular library and the first advanced CUJs, making it a robust tool for end-to-end testing. Key Enhancements: * **Modular Library (`lib/`)**: The monolithic `common.sh` is refactored into a modular library with components organized by function (`_core.sh`, `_network.sh`, `_dataproc.sh`, `_database.sh`, `_security.sh`). This incorporates advanced, parameterized, and idempotent functions for managing a wide range of GCP resources. * **Advanced Onboarding (`onboarding/`)**: New scripts are added to provision persistent, shared infrastructure, including a High-Availability Cloud SQL instance with VPC Peering and a dual-NIC Squid Proxy VM, following GCP best practices. * **New Critical User Journeys (`cuj/`)**: * `gce/standard`: This CUJ is enhanced to provision a full, NAT-based network environment. * `gce/proxy-egress`: A new CUJ is added to test Dataproc clusters that use a proxy for all outbound traffic. * `gke/standard`: A new CUJ is added for the standard Dataproc on GKE use case. * **Enhanced CI/CD (`ci/`)**: `pristine_check.sh` is upgraded to use a robust, tag-based cleanup strategy, making it scalable to any number of CUJs without modification. * **Finalized Configuration (`env.json`)**: The `env.json.sample` file is finalized with a simplified structure that defines the shared test environment and a `cuj_set` for test orchestration, abstracting implementation details from the user. * **Comprehensive Documentation (`README.md`)**: The README is updated to be a complete guide for the new framework, explaining its philosophy and providing a clear "Getting Started" workflow for new users.

cjac self-assigned this Jun 23, 2025

gemini-code-assist bot reviewed Jun 23, 2025

View reviewed changes

cjac changed the title ~~[gcloud] refactor(scripts): Organize tests into Critical User Journeys (CUJs)~~ refactor(gcloud): Introduce a test framework for Critical User Journeys Jun 24, 2025

cjac marked this pull request as draft June 24, 2025 00:57

cjac force-pushed the gcloud-cuj-framework branch from 8c4d7ca to ecb07ab Compare June 24, 2025 04:04

cjac force-pushed the gcloud-cuj-framework branch from ecb07ab to d790d25 Compare August 8, 2025 09:43

cjac force-pushed the gcloud-cuj-framework branch from d790d25 to 9475017 Compare August 8, 2025 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(gcloud): Introduce a test framework for Critical User Journeys #176

refactor(gcloud): Introduce a test framework for Critical User Journeys #176

Uh oh!

cjac commented Jun 23, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 23, 2025

Uh oh!

gemini-code-assist bot Jun 23, 2025

Uh oh!

gemini-code-assist bot Jun 23, 2025

Uh oh!

gemini-code-assist bot Jun 23, 2025

Uh oh!

cjac commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	--source-ranges="0.0.0.0/0" \
	--source-ranges="35.235.240.0/20" \

-    # ---FIX---
-    # Add defensive check for required configuration.
-    if [[ -z "${CONFIG[CLUSTER_NAME]}" || -z "${CONFIG[REGION]}" || -z "${CONFIG[SUBNET]}" || -z "${CONFIG[CUJ_TAG]}" ]]; then
-      echo "ERROR: One or more required keys (CLUSTER_NAME, REGION, SUBNET, CUJ_TAG) are missing from env.json" >&2
-      exit 1
-    fi
-    # ---END FIX---
+function require_config_keys() {
+  local missing_keys=()
+  for key in "$@"; do
+    if [[ -z "${CONFIG[$key]}" ]]; then
+      missing_keys+=("$key")
+    fi
+  done
+  if [[ ${#missing_keys[@]} -gt 0 ]]; then
+    ( # Use subshell to localize IFS change
+      IFS=,
+      echo "ERROR: One or more required keys (${missing_keys[*]}) are missing from env.json" >&2
+    )
+    exit 1
+  fi
+}

refactor(gcloud): Introduce a test framework for Critical User Journeys #176

Are you sure you want to change the base?

refactor(gcloud): Introduce a test framework for Critical User Journeys #176

Uh oh!

Conversation

cjac commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

cjac commented Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cjac commented Jun 23, 2025 •

edited

Loading