Skip to content

Izel/data-platform-module

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About data-platform-module

A GCP production-ready, reusable Terraform module that provisions a secure, well-architected GCP data platform following Google's enterprise best practices. Designed for organisations migrating to or scaling on GCP.

Platform Capabilities

Capability GCP Service
Networking VPC, Subnets, Private Google Access
Data Warehouse BigQuery (CMEK, column security)
Object Storage GCS (lifecycle policies, CMEK)
Secret Management Secret Manager + KMS
Identity & Access Service Accounts, IAM bindings
Observability Cloud Logging, Monitoring, Alerts
CI/CD Cloud Build

Module Structure

data-platform-module
├── scripts
├── modules
│ ├── apis
│ ├── networking
│ ├── iam
│ ├── storage
│ ├── bigquery
│ ├── security
│ └── monitoring
├── environments
│ ├── dev
│ ├── pre
│ └── prod
└── README.md


Architecture

environments/
  dev | pre | prod
        │
        ▼
  ┌─────────────────────────────────────────┐
  │           modules/                      │
  │  apis → networking → iam → security     │
  │         ↓         ↓        ↓            │
  │     storage   bigquery  monitoring      │
  └─────────────────────────────────────────┘

All modules are independently reusable and composable. Each environment (dev, pre, prod) references the same modules with environment-specific variable overrides via terraform.tfvars.

See more details about the architecture, patterns and design decisions, see the architecture document.


Prerequisites


Usage

1. Clone the repository

git clone https://github.com/Izel/data-platform-module.git
cd data-platform-module

2. Authenticate with GCP

gcloud auth login
gcloud config set project <YOUR_PROJECT_ID>

3. Environment configuration

  1. Create the terraform.tfvars file per environment:
├── environments\
│   ├── dev\
│      ├── terraform.tfvars  
│   ├── pre\
│      ├── terraform.tfvars  
│   ├── prod\
│      ├── terraform.tfvars  
  1. Edit the variable definitions below for each terraform.tfvars according to your values and environment:
# The project id  created in GCP. It must be associated to a billing account
project_id = "<YOUR_PROJECT_ID>" # p.e "my-data-infra-project"

# Region for the project resources location.
region = "<YOUR_SELECTED_REGION>" # p.e "europe-west2"

# The current deployment environment 
environment = "<YOUR_ENVIRONMENT>" # p.e "dev"

# The Subnet IP addres range and network mask
subnet_cidr = "<YOUR_SUBNET_IP_AND_MASK>" # p.e "10.0.0.0/24"

# KMS key rotation — shorter in dev for testing. Defined in Seconds
key_rotation_period = "<YOUR_KEY_ROTATION_PERIOD>" # p.e "2592000s" wich is 30 days in secons

# Monitoring — leave empty in dev if no notification channels set up
notification_channel_ids = []

Important

Hashicorp recommends to avoid pushing the terraform.tfvars file to public repository (Github, GitLab, BitBucket, etc.).

5. Deploy

terraform init  
terraform plan -out=tfplan
terraform apply tfplan

Environment Promotion

Environments share identical module composition. To promote from dev → pre → prod:

# Validate dev
cd environments/dev && terraform plan
 
# Promote to pre
cd environments/pre && terraform plan && terraform apply
 
# Promote to prod
cd environments/prod && terraform plan && terraform apply

Python Companion Scripts

The /scripts directory contains Python tooling for platform validation and auditing:

Script Purpose
iam_validator.py Audits IAM bindings for least-privilege violations
bucket_compliance_checker.py Validates GCS bucket security configuration
secret_rotation.py Rotates secrets in Secret Manager, logs to BigQuery
tfvars_validator.py CLI tool to validate tfvars completeness per env
platform_health_report.py Queries Cloud Monitoring and writes health to BQ

See /scripts/README.md for usage instructions.


Future Improvements

  • Add Dataplex for data governance and cataloguing.
  • Add VPC Service Controls to prevent data exfiltration.
  • Add Terraform test framework (terraform test).
  • Integrate with Cloud Build for automated plan/apply on PR.

Related Projects

  • crypto-prices — Real-time streaming pipeline (Pub/Sub → Dataflow → BigQuery) that uses this platform module for infrastructure provisioning
  • duck-pipeline-dev — Batch ETL pipeline on Cloud Run, provisioned using patterns from this module

About

A production ready and reusable Terraform module that provisions a secure, well-architected GCP data platform.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors