-
Notifications
You must be signed in to change notification settings - Fork 131
refactor(gcloud): Introduce a test framework for Critical User Journeys #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor(gcloud): Introduce a test framework for Critical User Journeys #176
Conversation
This commit refactors the project's testing and automation scripts,
moving them from a monolithic `bin/` directory into a structured
framework centered around Critical User Journeys (CUJs). The goal is
to create a more maintainable and scalable way to exercise the use
cases that are essential for our customers.
This new structure is designed for better organization, code reuse,
and CI/CD integration.
Key changes include:
* **CUJ Directory Structure:**
* `gcloud/cuj/`: Contains individual journey scripts. The first
CUJ, `standard-cluster-management`, is introduced to handle the
creation, deletion, and rebuilding of a standard Dataproc
cluster.
* `gcloud/onboarding/`: Holds scripts for setting up shared,
prerequisite infrastructure (e.g., GCS buckets) that CUJs depend
on.
* `gcloud/lib/`: A new directory for shared bash
libraries. `common.sh` is introduced to centralize functions for
configuration loading, network management, and logging.
* `gcloud/ci/`: Includes scripts specifically for CI/CD pipelines.
* **CI Environment Management:**
* A new `pristine_check.sh` script is added to ensure the GCP
project is in a clean state before a test run. It can operate in
a strict validation mode (fail if resources exist) or a cleanup
mode (actively delete resources).
* **Configuration and Code Improvements:**
* Configuration is centralized and simplified in `env.json`, which
is now loaded by the `load_config` function in `common.sh`.
* Scripts are refactored to be more robust, with clear `up`,
`down`, and `validate` commands, and checks for pre-existing
resources.
* The `.gitignore` is updated to exclude temporary files and local
configuration.
This change lays the foundation for adding more complex user journeys,
ensuring our core services are consistently tested from a customer
perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @cjac, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the project's testing and automation infrastructure by transitioning from a monolithic bin/ directory to a structured framework based on Critical User Journeys (CUJs). This change aims to enhance the maintainability, scalability, and reusability of scripts, ensuring robust and consistent testing of core customer use cases within CI/CD pipelines.
Highlights
- New Directory Structure for CUJs: Introduced
gcloud/cuj/,gcloud/onboarding/,gcloud/lib/, andgcloud/ci/to organize testing and automation scripts around Critical User Journeys, promoting modularity and reuse. - CI Environment Management: Added
gcloud/ci/pristine_check.shto validate and clean up GCP project resources, ensuring a pristine environment before test runs, with both strict validation and cleanup modes. - Centralized Configuration and Shared Utilities: Refactored configuration to
env.jsonand introducedgcloud/lib/common.shto centralize functions for configuration loading, network management (creation/deletion), and logging, improving script robustness and maintainability. - First Critical User Journey (CUJ): Implemented the
gcloud/cuj/standard-cluster-management/manage.shscript, providing a comprehensive lifecycle management (create, delete, rebuild) for a standard Dataproc cluster, serving as a template for future CUJs.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce a well-structured testing framework centered around Critical User Journeys (CUJs). The new structure and the use of common.sh and env.json are major improvements. The review focuses on improving the robustness and security of the new scripts, including idempotency in resource creation and a firewall rule.
gcloud/lib/common.sh
Outdated
| function create_network_and_subnet() { | ||
| # ... (logic to check if network exists) | ||
| gcloud compute networks create "${CONFIG[NETWORK]}" \ | ||
| --subnet-mode=custom \ | ||
| --description="Network for CUJ testing" \ | ||
| --bgp-routing-mode="regional" | ||
|
|
||
| # ... (logic to check if subnet exists) | ||
| gcloud compute networks subnets create "${CONFIG[SUBNET]}" \ | ||
| --network="${CONFIG[NETWORK]}" \ | ||
| --range="${CONFIG[SUBNET_CIDR]}" \ | ||
| --region="${CONFIG[REGION]}" | ||
|
|
||
| # Add firewall rule with the tag to allow SSH | ||
| gcloud compute firewall-rules create "${CONFIG[CUJ_TAG]}-allow-ssh" \ | ||
| --network="${CONFIG[NETWORK]}" \ | ||
| --allow=tcp:22 \ | ||
| --source-ranges="0.0.0.0/0" \ | ||
| --description="Allow SSH for CUJ test" \ | ||
| --target-tags="${CONFIG[CUJ_TAG]}" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create_network_and_subnet function is not idempotent. If the network, subnet, or firewall rule already exists, the gcloud ... create commands will fail. Check for the existence of each resource before attempting to create it.
function create_network_and_subnet() {
header "Creating Network and Subnet"
# Create network if it doesn't exist
if ! gcloud compute networks describe "${CONFIG[NETWORK]}" &>/dev/null; then
echo "Creating network '${CONFIG[NETWORK]}'..."
gcloud compute networks create "${CONFIG[NETWORK]}" \
--subnet-mode=custom \
--description="Network for CUJ testing" \
--bgp-routing-mode="regional"
else
echo "Network '${CONFIG[NETWORK]}' already exists."
fi
# Create subnet if it doesn't exist
if ! gcloud compute networks subnets describe "${CONFIG[SUBNET]}" --region="${CONFIG[REGION]}" &>/dev/null; then
echo "Creating subnet '${CONFIG[SUBNET]}'..."
gcloud compute networks subnets create "${CONFIG[SUBNET]}" \
--network="${CONFIG[NETWORK]}" \
--range="${CONFIG[SUBNET_CIDR]}" \
--region="${CONFIG[REGION]}"
else
echo "Subnet '${CONFIG[SUBNET]}' already exists."
fi
# Create firewall rule if it doesn't exist
local firewall_rule_name="${CONFIG[CUJ_TAG]}-allow-ssh"
if ! gcloud compute firewall-rules describe "${firewall_rule_name}" &>/dev/null; then
echo "Creating firewall rule '${firewall_rule_name}'..."
gcloud compute firewall-rules create "${firewall_rule_name}" \
--network="${CONFIG[NETWORK]}" \
--allow=tcp:22 \
--source-ranges="0.0.0.0/0" \
--description="Allow SSH for CUJ test" \
--target-tags="${CONFIG[CUJ_TAG]}"
else
echo "Firewall rule '${firewall_rule_name}' already exists."
fi
}
gcloud/lib/common.sh
Outdated
| gcloud compute firewall-rules create "${CONFIG[CUJ_TAG]}-allow-ssh" \ | ||
| --network="${CONFIG[NETWORK]}" \ | ||
| --allow=tcp:22 \ | ||
| --source-ranges="0.0.0.0/0" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening SSH (tcp:22) to the entire internet (0.0.0.0/0) is a significant security risk. Restrict the source-ranges to a safer alternative like the IP range for Google Cloud's Identity-Aware Proxy (IAP) for TCP forwarding (35.235.240.0/20).
| --source-ranges="0.0.0.0/0" \ | |
| --source-ranges="35.235.240.0/20" \ |
gcloud/ci/pristine_check.sh
Outdated
| echo "Found leftover Dataproc clusters: ${CLUSTERS}" | tee -a "${LEFTOVERS_FILE}" | ||
| if [[ "$STRICT_MODE" == false ]]; then | ||
| echo "Cleaning up..." | ||
| # Run deletions in the background for speed. | ||
| for cluster in ${CLUSTERS}; do | ||
| gcloud dataproc clusters delete --quiet "${cluster}" --region="${CONFIG[REGION]}" & | ||
| done | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a while read loop for iterating over lines of command output to handle resource names with spaces or special characters. Also, pipe the CLUSTERS variable directly to tee to preserve line breaks from the gcloud command's output.
| echo "Found leftover Dataproc clusters: ${CLUSTERS}" | tee -a "${LEFTOVERS_FILE}" | |
| if [[ "$STRICT_MODE" == false ]]; then | |
| echo "Cleaning up..." | |
| # Run deletions in the background for speed. | |
| for cluster in ${CLUSTERS}; do | |
| gcloud dataproc clusters delete --quiet "${cluster}" --region="${CONFIG[REGION]}" & | |
| done | |
| fi | |
| echo "Found leftover Dataproc clusters:" | tee -a "${LEFTOVERS_FILE}" | |
| echo "${CLUSTERS}" | tee -a "${LEFTOVERS_FILE}" | |
| if [[ "$STRICT_MODE" == false ]]; then | |
| echo "Cleaning up..." | |
| # Use a while-read loop for robustness. | |
| echo "${CLUSTERS}" | while IFS= read -r cluster; do | |
| if [[ -n "${cluster}" ]]; then | |
| gcloud dataproc clusters delete --quiet "${cluster}" --region="${CONFIG[REGION]}" & | |
| fi | |
| done | |
| fi |
| # ---FIX--- | ||
| # Add defensive check for required configuration. | ||
| if [[ -z "${CONFIG[CLUSTER_NAME]}" || -z "${CONFIG[REGION]}" || -z "${CONFIG[SUBNET]}" || -z "${CONFIG[CUJ_TAG]}" ]]; then | ||
| echo "ERROR: One or more required keys (CLUSTER_NAME, REGION, SUBNET, CUJ_TAG) are missing from env.json" >&2 | ||
| exit 1 | ||
| fi | ||
| # ---END FIX--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration validation logic is repeated in the delete_cluster function (lines 42-45). Extract this logic into a shared helper function to improve maintainability and reduce code duplication.
| # ---FIX--- | |
| # Add defensive check for required configuration. | |
| if [[ -z "${CONFIG[CLUSTER_NAME]}" || -z "${CONFIG[REGION]}" || -z "${CONFIG[SUBNET]}" || -z "${CONFIG[CUJ_TAG]}" ]]; then | |
| echo "ERROR: One or more required keys (CLUSTER_NAME, REGION, SUBNET, CUJ_TAG) are missing from env.json" >&2 | |
| exit 1 | |
| fi | |
| # ---END FIX--- | |
| function require_config_keys() { | |
| local missing_keys=() | |
| for key in "$@"; do | |
| if [[ -z "${CONFIG[$key]}" ]]; then | |
| missing_keys+=("$key") | |
| fi | |
| done | |
| if [[ ${#missing_keys[@]} -gt 0 ]]; then | |
| ( # Use subshell to localize IFS change | |
| IFS=, | |
| echo "ERROR: One or more required keys (${missing_keys[*]}) are missing from env.json" >&2 | |
| ) | |
| exit 1 | |
| fi | |
| } | |
|
This PR cannot resolve GoogleCloudDataproc/initialization-actions#1281 until it implements the proxy-only CUJ. |
This commit refactors the project's testing and automation scripts,
moving them from a monolithic `bin/` directory into a structured
framework centered around Critical User Journeys (CUJs). The goal is
to create a more maintainable and scalable way to exercise the use
cases that are essential for our customers.
This new structure is designed for better organization, code reuse,
and CI/CD integration.
Key changes include:
* **CUJ Directory Structure:**
* `gcloud/cuj/`: Contains individual journey scripts. The first
CUJ, `gce/standard`, is introduced to handle the creation,
deletion, and rebuilding of a standard Dataproc cluster.
* `gcloud/onboarding/`: Holds scripts for setting up shared,
prerequisite infrastructure (e.g., GCS buckets) that CUJs depend
on.
* `gcloud/lib/`: A new directory for shared bash
libraries. `common.sh` is introduced to centralize functions for
configuration loading, network management, and logging.
* `gcloud/ci/`: Includes scripts specifically for CI/CD pipelines.
* **CI Environment Management:**
* A new `pristine_check.sh` script is added to ensure the GCP
project is in a clean state before a test run. It can operate in
a strict validation mode (fail if resources exist) or a cleanup
mode (actively delete resources).
* **Configuration and Code Improvements:**
* Configuration is centralized and simplified in `env.json`, which
is now loaded by the `load_config` function in `common.sh`.
* Scripts are refactored to be more robust, with clear `up`,
`down`, and `validate` commands, and checks for pre-existing
resources.
* The `.gitignore` is updated to exclude temporary files and local
configuration.
This change lays the foundation for adding more complex user journeys,
ensuring our core services are consistently tested from a customer
perspective.
8c4d7ca to
ecb07ab
Compare
ecb07ab to
d790d25
Compare
This commit builds upon the foundational CUJ framework by ingesting
battle-tested logic from numerous sources and implementing the initial set
of comprehensive, production-like Critical User Journeys.
The framework is now enhanced with a powerful, modular library and the
first advanced CUJs, making it a robust tool for end-to-end testing.
Key Enhancements:
* **Modular Library (`lib/`)**:
The monolithic `common.sh` is refactored into a modular library
with components organized by function (`_core.sh`, `_network.sh`,
`_dataproc.sh`, `_database.sh`, `_security.sh`). This incorporates
advanced, parameterized, and idempotent functions for managing a
wide range of GCP resources.
* **Advanced Onboarding (`onboarding/`)**:
New scripts are added to provision persistent, shared infrastructure,
including a High-Availability Cloud SQL instance with VPC Peering and
a dual-NIC Squid Proxy VM, following GCP best practices.
* **New Critical User Journeys (`cuj/`)**:
* `gce/standard`: This CUJ is enhanced to provision a full,
NAT-based network environment.
* `gce/proxy-egress`: A new CUJ is added to test Dataproc
clusters that use a proxy for all outbound traffic.
* `gke/standard`: A new CUJ is added for the standard Dataproc
on GKE use case.
* **Enhanced CI/CD (`ci/`)**:
`pristine_check.sh` is upgraded to use a robust, tag-based cleanup
strategy, making it scalable to any number of CUJs without
modification.
* **Finalized Configuration (`env.json`)**:
The `env.json.sample` file is finalized with a simplified structure
that defines the shared test environment and a `cuj_set` for test
orchestration, abstracting implementation details from the user.
* **Comprehensive Documentation (`README.md`)**:
The README is updated to be a complete guide for the new framework,
explaining its philosophy and providing a clear "Getting Started"
workflow for new users.
d790d25 to
9475017
Compare
refactor(gcloud): Introduce a test framework for Critical User Journeys
This commit refactors the project's testing and automation scripts, moving them from a monolithic
bin/directory into a structured framework centered around Critical User Journeys (CUJs). The goal is to create a more maintainable and scalable way to exercise the use cases that are essential for our customers.This new structure is designed for better organization, code reuse, and CI/CD integration.
Key changes include:
CUJ Directory Structure:
gcloud/cuj/: Contains individual journey scripts. The first CUJ,standard-cluster-management, is introduced to handle the creation, deletion, and rebuilding of a standard Dataproc cluster.gcloud/onboarding/: Holds scripts for setting up shared, prerequisite infrastructure (e.g., GCS buckets) that CUJs depend on.gcloud/lib/: A new directory for shared bash libraries.common.shis introduced to centralize functions for configuration loading, network management, and logging.gcloud/ci/: Includes scripts specifically for CI/CD pipelines.CI Environment Management:
pristine_check.shscript is added to ensure the GCP project is in a clean state before a test run. It can operate in a strict validation mode (fail if resources exist) or a cleanup mode (actively delete resources).Configuration and Code Improvements:
Configuration is centralized and simplified in
env.json, which is now loaded by theload_configfunction incommon.sh.Scripts are refactored to be more robust, with clear
up,down, andvalidatecommands, and checks for pre-existing resources.The
.gitignoreis updated to exclude temporary files and local configuration.This change lays the foundation for adding more complex user journeys, ensuring our core services are consistently tested from a customer perspective.
Resolves: GoogleCloudDataproc/initialization-actions#1281
Relates: go/sf/59134938