Skip to content

fraware/cloudpilot

Repository files navigation

CloudPilot

ML-assisted signals for scaling, cost, and Kubernetes operations

Python 3.10+ MIT License Ruff Mypy pytest

Quick start  ·  Features  ·  Configuration  ·  Contributing  ·  License


CloudPilot connects Prometheus metrics, Kubernetes, and AWS pricing data to a small set of Python modules that recommend scaling actions, surface cost-oriented hints, tune deployment CPU limits, and flag anomalies. It is built for operators and engineers who want transparent defaults, testable behavior, and explicit guardrails when automation touches production clusters.

flowchart LR
  subgraph signals [Data sources]
    PR[Prometheus]
    K8[Kubernetes]
    AW[AWS Pricing]
  end
  subgraph core [CloudPilot]
    CP[Heuristics and ML]
  end
  subgraph out [Outcomes]
    SC[Scaling hints]
    CO[Cost hints]
    TU[Auto-tuning]
    AN[Anomaly signals]
  end
  PR --> CP
  K8 --> CP
  AW --> CP
  CP --> SC
  CP --> CO
  CP --> TU
  CP --> AN
Loading

Table of contents

Quick start Clone, environment, and first install
Features What the toolkit does
Tech stack Languages, libraries, and CI
Requirements What you need before running
Project layout Repository map
Installation Extras, uv, and Locust
Configuration Environment variables
Usage CLI and Locust
Machine learning artifacts Models and training
AWS and Kubernetes notes Integration details
Testing and quality Pytest, coverage, audits
Roadmap Planned direction
Contributing How to help
License Legal

Quick start

git clone https://github.com/<your-org-or-username>/cloudpilot.git
cd cloudpilot
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -e ".[dev,ml]" --extra-index-url https://download.pytorch.org/whl/cpu
pytest
cloudpilot --version

Runtime-only install (includes optional PyTorch for scripted scaling): pip install -e ".[ml]".


Features

Capability What you get
Scaling intelligence TorchScript inference when a model is available; otherwise a safe, deterministic fallback.
Cost awareness EC2 pricing lookups via the AWS Price List API, returned as concise guidance.
Kubernetes tuning Heuristic CPU limit adjustments with an optional dry-run that skips API patches.
Anomaly detection Isolation Forest over metric features; model training is lazy (not at import time).
Self-healing Pod restarts only when explicitly confirmed through configuration—never by default.
Load simulation Poisson-style request timing and a stress-test placeholder for experimentation.

Tech stack

Layer Details
Runtime Python 3.10+ (pyproject.toml)
Machine learning scikit-learn; PyTorch via optional ml extra
Cloud & orchestration boto3, official Kubernetes client
Observability prometheus-api-client
Optional load tests Locust (locustfile.py)
Quality gates Ruff, Mypy, Bandit, pip-audit, pytest, coverage
Continuous integration GitHub Actions on 3.10, 3.11, 3.12 (.github/workflows/ci.yml)

Requirements

  • Python 3.10 or newer.
  • AWS (for pricing): credentials or role available to boto3 (for example standard environment variables or instance metadata).
  • Kubernetes (for live tuning or self-heal): a valid kubeconfig and cluster reachability—omit if you only run the test suite with mocks.

Project layout

.
├── cloudpilot/                 # Main package (PEP 561: py.typed)
│   ├── config.py               # Central env-based settings
│   ├── scaling.py
│   ├── cost_optimizer.py
│   ├── k8s_autotuner.py
│   ├── anomaly_detector.py
│   ├── load_tester.py
│   └── training_rl_scaler.py
├── tests/
├── cli.py                      # Same entry as console script `cloudpilot`
├── locustfile.py
├── pyproject.toml
├── uv.lock
├── .github/workflows/ci.yml
├── CONTRIBUTING.md
├── LICENSE
└── README.md

Installation

1. Clone (use your fork or upstream URL).

git clone https://github.com/<your-org-or-username>/cloudpilot.git
cd cloudpilot

2. Virtual environment (recommended).

python -m venv .venv
source .venv/bin/activate          # Linux / macOS
# .venv\Scripts\activate.bat       # Windows cmd
# .venv\Scripts\Activate.ps1       # Windows PowerShell

3. Install one of the following.

Goal Command
Application + ML extra pip install -e ".[ml]"
Full developer + ML (matches CI toolset) pip install -e ".[dev,ml]" --extra-index-url https://download.pytorch.org/whl/cpu

The CPU PyTorch index keeps wheels smaller on Linux, macOS, and typical CI images. For CUDA builds, drop the extra index and install the wheel set that matches your platform.

Reproducible installs with uv

uv sync --all-extras

The first resolve may pull a large PyTorch artifact when ml is included.

Optional: Locust

pip install locust

Dependency extras (declared under [project.optional-dependencies] in pyproject.toml):

Extra Includes
ml torch>=2.0 for scripted scaling
dev pytest, coverage, Ruff, Mypy, Bandit, pip-audit, types-PyYAML

Combine with: pip install -e ".[dev,ml]".

Note. requirements.txt documents install patterns only; it does not pin versions. Prefer pyproject.toml and, when using uv, uv.lock.


Configuration

All settings are read from the environment. Source of truth: cloudpilot/config.py.

Variable Default Role
CLOUDPILOT_PROMETHEUS_URL http://localhost:9090 Prometheus base URL
CLOUDPILOT_PROMETHEUS_DISABLE_SSL 1 (truthy) Skip TLS verification for Prometheus
CLOUDPILOT_SELF_HEAL_CONFIRM unset Must be 1, true, yes, or on to allow destructive pod deletes in self_heal
CLOUDPILOT_AWS_PRICING_REGION us-east-1 Region for the Pricing API client
CLOUDPILOT_K8S_DRY_RUN unset If truthy, tuning runs without patching the cluster

Safety. Pod deletion is opt-in by design. Without CLOUDPILOT_SELF_HEAL_CONFIRM, self-heal reports a skip instead of mutating the cluster.


Usage

CLI

The cloudpilot command (or python cli.py) exposes:

Action Example
Scaling recommendation cloudpilot scale --cpu 80 --mem 70 --req 0.8 --latency 100 --demand 0.9
Cost hint cloudpilot cost --instance-type m5.large
Deployment tuning cloudpilot tune --deployment your-deployment --namespace default
Version cloudpilot --version

For scale, --demand must lie in [0, 1].

Locust

locust -f locustfile.py

Then open the Locust UI in your browser to control the scenario.


Machine learning artifacts

  • Inference: With torch installed, CloudPilot searches for rl_scaling_model.pt as packaged data under cloudpilot/, then on disk beside the package. Missing file or missing torch yields a stable heuristic outcome (Maintain).
  • Training a placeholder model: With the ml extra: python -m cloudpilot.training_rl_scaler writes rl_scaling_model.pt in the working directory. Package or mount that file where your runtime expects it.

AWS and Kubernetes notes

  • AWS: Pricing filters target common Linux / shared-tenancy / regional product rows. Extend or change filters in code if you need other operating systems or commercial terms.
  • Kubernetes: The client uses default kubeconfig discovery. Use CLOUDPILOT_K8S_DRY_RUN to exercise tuning logic without applying patch_namespaced_deployment.

Testing and quality

Default pytest options exclude @pytest.mark.integration tests (see pyproject.toml).

pytest

CI-style run (coverage + JUnit):

pytest --junitxml=junit.xml -q --cov=cloudpilot --cov=cli --cov-report=xml --cov-report=term

Coverage enforces fail_under = 45 when reporting is enabled.

pytest -m integration          # only integration-marked tests
py -m pytest                   # Windows launcher if `pytest` is not on PATH

Security and supply chain (also executed in CI):

bandit -r cloudpilot -c pyproject.toml
pip freeze > freeze.txt && pip-audit -r freeze.txt --desc on && rm freeze.txt

freeze.txt is gitignored—do not commit it.


Roadmap

  • RL training grounded in real workload history.
  • Broader cloud pricing (GCP, Azure).
  • Stronger anomaly models (for example sequence models or autoencoders).
  • Operator-focused dashboard.
  • Deeper integration with industrial load and stress tools.

Contributing

Guidelines, hooks, and review expectations live in CONTRIBUTING.md. Issues and pull requests are welcome.


License

Released under the MIT License.

Copyright (c) 2025 Matéo H. Petel.

About

CloudPilot is an AI-driven infrastructure optimizer designed to continuously learn and adapt, helping organizations reduce cloud costs while maintaining optimal performance, reliability, and scalability.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages