Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions skypilot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ A multi-node distributed training configuration using PyTorch's Distributed Data

**Use Case:** Large-scale distributed training across multiple nodes for computationally intensive models.

### 4. my-caios-devpod.yaml

A development environment configuration demonstrating CoreWeave Object Storage (CAIOS) integration with boto3 for reading, writing, and listing objects. If you have not yet configured CAIOS credentials, please follow the guidance in [this section](https://github.com/coreweave/reference-architecture/tree/main/storage/caios-credentials) to automatically configure CAIOS credentials.

**Use Case:** Testing and validating CAIOS bucket access with AWS-compatible tools in a GPU-accelerated development environment.

## Getting Started

To use any of these configurations:
Expand Down
89 changes: 89 additions & 0 deletions skypilot/config-examples/my-caios-devpod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
name: mydevpod

resources:
# Modify this below to request different resources
accelerators: H200:1 # Use 1 H200
image_id: docker:ghcr.io/coreweave/ml-containers/nightly-torch-extras:8b6c417-base-25110205-cuda12.9.1-ubuntu22.04-torch2.10.0a0-vision0.25.0a0-audio2.10.0a0
memory: 32+ # Request at least 32GB of RAM

file_mounts:
/my_data: # Mount storage bucket to /my_data in the container
source: cw://BUCKETNAME # Change this to be your bucket name
mode: MOUNT # MOUNT or COPY or MOUNT_CACHED. Defaults to MOUNT. Optional.
# Sync data in my-code/ on local machine to ~/sky_workdir in the container
# workdir: ./my-code

#Environment variables to set in the container
# These are needed to access CoreWeave Object Storage using the AWS CLI
envs:
AWS_SHARED_CREDENTIALS_FILE: "~/.coreweave/cw.credentials"
AWS_CONFIG_FILE: "~/.coreweave/cw.config"
Comment on lines +19 to +20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding sample AWS CONFIG file to show the format? Possibly in README.md or as a .dotfile? I find it useful when dealing with AI Object Storage. Something on the lines of

[default]
endpoint_url = https://cwobject.com
s3 =
   addressing_style = virtual
region = us-west-13b
output = json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmadhyastha-cw has a file which configures credentials for storage in https://github.com/coreweave/reference-architecture/blob/tmadhyastha/caios-credential-setup/storage/caios-credentials/configure_caios_credentials.sh perhaps I can add a pre-requisite to run this file first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-linking existing resource in the same repo works too 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw this should be the preferred mechanism for getting CAIOS credentials once it's released, just requires a service account being specified for pods: https://github.com/coreweave/kabinet-charts/blob/main/charts/pod-identity-webhook/README.md
AWS SDK-based tools that can use creds files can also use the variables injected by the webhook to get short-lived, auto-rotated creds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radu-malliu I do not understand how this webhook mechanism helps. Can you help explain how you would envision this flow changing with this? CAIOS credentials are currently handled by skypilot in the AWS config style format, and are copied automatically to the pods created by skypilot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding cross-linking, yes I think one of us should definitely put a sample config file in the examples - I think it would make sense to go here. You can say, if you do not have a profile "cw" defined, you can run the other script to set that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmadhyastha-cw based on this SkyPilot page, it's possible to specify a service account for pods that SkyPilot launches.
The mutating webhook I mentioned is responsible for injecting an OIDC token issued by the cluster and env variables for an endpoint where such token can be exchanged for CAIOS credentials, assuming the setup described in the webhook docs exists. The mechanism for the exchange is built into the AWS SDK. In other words, if the env vars are there, there is a credential provider in the provider chain which can be called to obtain credentials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radu-malliu is this released? Would be happy to test it out with SP if so. It is neat, and would simplify the credential configuration process greatly.

AWS_PROFILE: "cw"


# Any setup commands to run in the container before 'run'
# Here we install the AWS CLI to access storage
setup: |
echo "Setting up test storage environment..."
# Install AWS CLI
apt install python3.10-venv -y
curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
unzip awscli-bundle.zip
sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
echo export AWS_CONFIG_FILE=$AWS_CONFIG_FILE >> ~/.bashrc
echo export AWS_SHARED_CREDENTIALS_FILE=$AWS_SHARED_CREDENTIALS_FILE >> ~/.bashrc
echo export AWS_PROFILE=$AWS_PROFILE >> ~/.bashrc
# Install boto3
pip install boto3

run: |
echo "Testing CAIOS bucket access with boto3..."
python3 << 'EOF'
import boto3
from botocore.client import Config
import os

# LOTA endpoint for CoreWeave Object Storage
ENDPOINT_URL = 'http://cwlota.com'
BUCKET_NAME = 'BUCKET_NAME'

# Read credentials from AWS config files
session = boto3.Session(profile_name='cw')
credentials = session.get_credentials()

# Create S3 client with virtual-hosted style addressing
s3_client = boto3.client(
's3',
endpoint_url=ENDPOINT_URL,
aws_access_key_id=credentials.access_key,
aws_secret_access_key=credentials.secret_key,
config=Config(s3={'addressing_style': 'virtual'})
)

# Write test
print(f"Writing test file to {BUCKET_NAME}...")
test_content = "Hello from SkyPilot on CoreWeave!"
s3_client.put_object(
Bucket=BUCKET_NAME,
Key='skypilot_test.txt',
Body=test_content.encode('utf-8')
)
print("✓ Write successful!")

# Read test
print(f"Reading test file from {BUCKET_NAME}...")
response = s3_client.get_object(Bucket=BUCKET_NAME, Key='skypilot_test.txt')
content = response['Body'].read().decode('utf-8')
print(f"✓ Read successful! Content: {content}")

# List objects
print(f"\nListing objects in {BUCKET_NAME}:")
response = s3_client.list_objects_v2(Bucket=BUCKET_NAME, MaxKeys=10)
if 'Contents' in response:
for obj in response['Contents']:
print(f" - {obj['Key']} ({obj['Size']} bytes)")
else:
print(" Bucket is empty")

print("\n✓ CAIOS test completed successfully!")
EOF