Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
283 changes: 283 additions & 0 deletions docs/en/model_inference/model_management/functions/model_storage.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
---
weight: 20
---

# Model Storage
You can store a model in an S3 bucket or Open Container Initiative (OCI) containers.
Comment on lines +5 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include PVC in the opening sentence.
The intro says only S3/OCI, but the page also documents PVC storage.

✏️ Suggested edit
-You can store a model in an S3 bucket or Open Container Initiative (OCI) containers.
+You can store a model in an S3 bucket, an Open Container Initiative (OCI) container image, or a PVC.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Model Storage
You can store a model in an S3 bucket or Open Container Initiative (OCI) containers.
# Model Storage
You can store a model in an S3 bucket, an Open Container Initiative (OCI) container image, or a PVC.
🤖 Prompt for AI Agents
In `@docs/en/model_inference/model_management/functions/model_storage.mdx` around
lines 5 - 6, The opening sentence under the "Model Storage" heading currently
lists only S3 and OCI; update that sentence to also mention PVC (Persistent
Volume Claim) so it reflects all storage options documented on the page (e.g.,
"You can store a model in an S3 bucket, Open Container Initiative (OCI)
containers, or a Persistent Volume Claim (PVC)."). Locate and edit the initial
paragraph following the "Model Storage" heading and ensure terminology matches
other sections that reference PVC (use "Persistent Volume Claim (PVC)" on first
mention).


In cloud-native inference scenarios, model storage determines the startup speed, version management granularity, and scalability of inference services. KServe loads models through two main mechanisms:

- **Storage Initializer (Init Container)**: For S3 and PVC, downloads/mounts data before the main container starts.
- **Sidecar**: For OCI images, achieves second-level loading using the container runtime's layered caching capability.

## Using S3 Object Storage for model storage
This is the most commonly used mode. It implements credential management through Secret with specific labels.

### Authentication Configuration
It is recommended to create separate ServiceAccount and Secret for each project.

#### S3 Key Configuration Parameters
| Configuration Item | Actual Value | Description |
|-------------------|-------------|-------------|
| Endpoint | your-s3-service-ip:your-s3-port | IP and port pointing to private MinIO service |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align S3 endpoint placeholder formatting.
The table uses hyphens while the YAML uses underscores; this can confuse copy/paste.

✏️ Suggested edit
-| Endpoint | your-s3-service-ip:your-s3-port | IP and port pointing to private MinIO service |
+| Endpoint | your_s3_service_ip:your_s3_port | IP and port pointing to private MinIO service |

Also applies to: 36-36

🤖 Prompt for AI Agents
In `@docs/en/model_inference/model_management/functions/model_storage.mdx` at line
22, Replace the hyphenated S3 endpoint placeholders in the table
("your-s3-service-ip:your-s3-port") with the underscore-formatted placeholders
used in the YAML ("your_s3_service_ip:your_s3_port") so the example strings
match; update all occurrences (e.g., the cell at the shown table row and the
other instance around line 36) to use the underscore format for consistency.

| Region | (Not specified) | Default is usually us-east-1, KServe will use default value if not detected |
| HTTPS Enabled | 0 | Encryption disabled for internal test/Demo environment |
| Authentication Method | Static Access Key / Secret Key | Managed through Secret named minio-creds |
| Namespace Isolation | demo-space | Permissions limited to this namespace, following multi-tenant isolation principles |

```yaml
apiVersion: v1
data:
AWS_ACCESS_KEY_ID: YOUR_BASE64_ENCODED_ACCESS_KEY # [!code callout]
AWS_SECRET_ACCESS_KEY: YOUR_BASE64_ENCODED_SECRET_KEY # [!code callout]
kind: Secret
metadata:
annotations:
serving.kserve.io/s3-endpoint: your_s3_service_ip:your_s3_port # [!code callout]
serving.kserve.io/s3-usehttps: "0" # [!code callout]
name: minio-creds
namespace: demo-space
type: Opaque

apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-models
namespace: demo-space
secrets:
- name: minio-creds
```

<Callouts>
1. Replace `YOUR_BASE64_ENCODED_ACCESS_KEY` with your actual Base64-encoded AWS access key ID.
2. Replace `YOUR_BASE64_ENCODED_SECRET_KEY` with your actual Base64-encoded AWS secret access key.
3. Replace `your_s3_service_ip:your_s3_port` with the actual IP address and port of your S3 service.
4. Set `serving.kserve.io/s3-usehttps` to "1" if your S3 service uses HTTPS, or "0" if it uses HTTP.
</Callouts>

### Deploy Inference Service
```yaml
kind: InferenceService
apiVersion: serving.kserve.io/v1beta1
metadata:
annotations:
aml-model-repo: Qwen2.5-0.5B-Instruct # [!code callout]
aml-pipeline-tag: text-generation
serving.kserve.io/deploymentMode: Standard
labels:
aml.cpaas.io/runtime-type: vllm # [!code callout]
name: s3-demo
namespace: demo-space
spec:
predictor:
maxReplicas: 1
minReplicas: 1
model:
modelFormat:
name: transformers
name: ''
protocolVersion: v2
resources:
limits:
cpu: '2'
ephemeral-storage: 10Gi
memory: 8Gi
requests:
cpu: '2'
memory: 4Gi
runtime: aml-vllm-0.11.2-cpu # [!code callout]
storageUri: s3://models/Qwen2.5-0.5B-Instruct # [!code callout]
securityContext:
seccompProfile:
type: RuntimeDefault
serviceAccountName: sa-models
```

<Callouts>
1. Replace `Qwen2.5-0.5B-Instruct` with your actual model name.
2. `aml.cpaas.io/runtime-type: vllm` specifies the code runtime type. For more information about custom inference runtimes, see [Extend Inference Runtimes](../../../../inference_service/how_to/custom_inference_runtime.mdx).
3. Replace `aml-vllm-0.11.2-cpu` with the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance).
4. `storageUri: s3://models/Qwen2.5-0.5B-Instruct` specifies the S3 bucket URI where the model is stored.
</Callouts>

## Using OCI containers for model storage
Package models as images (Model-as-Image) and use enterprise internal Quay or Harbor for distribution, ideal for offline environments.

### Model Image Packaging
Use a simple Containerfile to build model image:

```dockerfile
# Use lightweight busybox as base image
FROM busybox

# Create directory for model and set permissions
RUN mkdir -p /models && chmod 775 /models

# Copy local model folder contents to /models directory in image
# Note: This copies the contents of the folder
COPY Qwen2.5-0.5B-Instruct/ /models/

# According to KServe convention, model loader usually only needs image layers
# No need to keep running, but can add CMD if debugging is needed
```

### Deploy Inference Service
#### Prerequisites:
- The namespace where the inference service is located must have PSA (Pod Security Admission) Enforce set to Privilege
- Enable Modelcar in AmlCluster resource with uidModelcar set to 0
Comment on lines +125 to +127
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Kubernetes Pod Security Admission levels privileged baseline restricted official

💡 Result:

Kubernetes Pod Security Admission (PSA) is the built-in admission controller that can enforce the Pod Security Standards (PSS) at the namespace level. PSA supports three official PSS levels (from least to most restrictive): privileged, baseline, restricted. (v1-34.docs.kubernetes.io)

Official Pod Security levels (PSS)

  • privileged: Unrestricted; allows known privilege escalations and host-level access patterns (intended for trusted system/infrastructure workloads). (v1-34.docs.kubernetes.io)
  • baseline: Minimally restrictive; “default-friendly” while preventing known privilege escalations. (v1-34.docs.kubernetes.io)
  • restricted: Heavily restrictive; aligns to current pod hardening best practices (most secure, least compatible). (v1-34.docs.kubernetes.io)

How PSA applies these levels (namespace labels)

You set namespace labels to choose a level for each PSA mode:

  • pod-security.kubernetes.io/enforce: <privileged|baseline|restricted>
  • pod-security.kubernetes.io/audit: <...>
  • pod-security.kubernetes.io/warn: <...>
    Optional pinning:
  • pod-security.kubernetes.io/<mode>-version: <k8s minor version or latest> (v1-34.docs.kubernetes.io)

PSA modes mean:

  • enforce: reject non-compliant pods
  • audit: allow but annotate for audit logs
  • warn: allow but show user warning (v1-34.docs.kubernetes.io)

Citations:


PSA level should be "privileged".
Kubernetes Pod Security Admission levels use lowercase terminology: "privileged", "baseline", "restricted". The current text uses incorrect capitalization.

✏️ Suggested edit
-- The namespace where the inference service is located must have PSA (Pod Security Admission) Enforce set to Privilege
+- The namespace where the inference service is located must have PSA (Pod Security Admission) Enforce set to privileged
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#### Prerequisites:
- The namespace where the inference service is located must have PSA (Pod Security Admission) Enforce set to Privilege
- Enable Modelcar in AmlCluster resource with uidModelcar set to 0
#### Prerequisites:
- The namespace where the inference service is located must have PSA (Pod Security Admission) Enforce set to privileged
- Enable Modelcar in AmlCluster resource with uidModelcar set to 0
🤖 Prompt for AI Agents
In `@docs/en/model_inference/model_management/functions/model_storage.mdx` around
lines 125 - 127, Update the prerequisite text that currently reads "PSA (Pod
Security Admission) Enforce set to Privilege" to use the correct lowercase
Kubernetes Pod Security Admission level: change it to "PSA (Pod Security
Admission) Enforce set to privileged"; ensure the rest of the prerequisite (the
Enable Modelcar line with uidModelcar set to 0) remains unchanged.


##### Procedure to enable Modelcar in AmlCluster:
1. Log in to the Alauda AI dashboard as an administrator.
2. Navigate to **Marketplace / OperatorHub** and select **Alauda AI**.
3. Click **All Instances** and select the **default** AmlCluster instance.
4. From the **Actions** dropdown, select **Update** to open the update form.
5. In the **Values** section, add the following configuration under `kserve.values`:
```yaml
spec:
components:
kserve:
values:
kserve:
storage:
enableModelcar: true
uidModelcar: 0
```
6. Click **Update** to save the changes.
7. Wait for the AmlCluster instance to reach the **Ready** status.

For more information about AmlCluster configuration, see [Installing Alauda AI Cluster](../../../../installation/ai-cluster.mdx).

KServe supports native OCI protocol:

```yaml
kind: InferenceService
apiVersion: serving.kserve.io/v1beta1
metadata:
annotations:
aml-model-repo: Qwen2.5-0.5B-Instruct # [!code callout]
aml-pipeline-tag: text-generation
serving.kserve.io/deploymentMode: Standard
labels:
aml-pipeline-tag: text-generation
aml.cpaas.io/runtime-type: vllm # [!code callout]
name: oci-demo
namespace: demo-space
spec:
predictor:
maxReplicas: 1
minReplicas: 1
model:
modelFormat:
name: transformers
protocolVersion: v2
resources:
limits:
cpu: '2'
ephemeral-storage: 10Gi
memory: 8Gi
requests:
cpu: '2'
memory: 4Gi
runtime: aml-vllm-0.11.2-cpu # [!code callout]
storageUri: oci://build-harbor.alauda.cn/test/qwen-oci:v1.0.0 # [!code callout]
securityContext:
seccompProfile:
type: RuntimeDefault
```

<Callouts>
1. Replace `Qwen2.5-0.5B-Instruct` with your actual model name.
2. `aml.cpaas.io/runtime-type: vllm` specifies the code runtime type. For more information about custom inference runtimes, see [Extend Inference Runtimes](../../../../inference_service/how_to/custom_inference_runtime.mdx).
3. Replace `aml-vllm-0.11.2-cpu` with the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance).
4. `storageUri: oci://build-harbor.alauda.cn/test/qwen-oci:v1.0.0` specifies the OCI image URI with tag where the model is stored.
</Callouts>

## Using PVC for model storage

### Uploading model files to a PVC
When deploying a model, you can serve it from a preexisting Persistent Volume Claim (PVC) where your model files are stored. You can upload your local model files to a PVC in the IDE that you access from a running workbench.

#### Prerequisites
- You have access to the Alauda AI dashboard.
- You have access to a project that has a running workbench.
- You have created a persistent volume claim (PVC).
- The workbench is attached to the persistent volume (PVC).

For instructions on creating a workbench and attaching a PVC, see [Create Workbench](../../../../workbench/how_to/create_workbench.mdx).
- You have the model files saved on your local machine.

#### Procedure
Follow these steps to upload your model files to the PVC within your workbench:

1. From the Alauda AI dashboard, click **Workbench** to enter the workbench list page.

2. Find your running workbench instance and click the **Connect** button to enter the workbench.

3. In your workbench IDE, navigate to the file browser:
- In JupyterLab, this is the **Files** tab in the left sidebar.
- In code-server, this is the **Explorer** view in the left sidebar.

4. In the file browser, navigate to the home directory. This directory represents the root of your attached PVC.
Comment on lines +216 to +220
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reduce repeated “In …” sentence starts.
Three consecutive steps start with “In…”, which reads repetitive.

✏️ Suggested edit
-3. In your workbench IDE, navigate to the file browser:
+3. From your workbench IDE, open the file browser:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
3. In your workbench IDE, navigate to the file browser:
- In JupyterLab, this is the **Files** tab in the left sidebar.
- In code-server, this is the **Explorer** view in the left sidebar.
4. In the file browser, navigate to the home directory. This directory represents the root of your attached PVC.
3. From your workbench IDE, open the file browser:
- In JupyterLab, this is the **Files** tab in the left sidebar.
- In code-server, this is the **Explorer** view in the left sidebar.
4. In the file browser, navigate to the home directory. This directory represents the root of your attached PVC.
🧰 Tools
🪛 LanguageTool

[style] ~220-~220: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...xplorer** view in the left sidebar. 4. In the file browser, navigate to the home ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🤖 Prompt for AI Agents
In `@docs/en/model_inference/model_management/functions/model_storage.mdx` around
lines 216 - 220, Steps 3 and 4 repeat the sentence starter "In…" — reword them
to avoid repetition by merging into one instruction: replace the two lines
starting "In your workbench IDE, navigate to the file browser:" and "In the file
browser, navigate to the home directory." with a single line like "Open the file
browser (Files tab in JupyterLab or Explorer view in code-server) and navigate
to the home directory, which represents the root of your attached PVC." This
keeps the referenced UI elements ("Files tab", "Explorer view", "home
directory") but removes the repeated "In…" sentence starts.

> **Note**
> Any files or folders that you create or upload to this folder persist in the PVC.

5. Optional: Create a new folder to organize your models:
- In the file browser, right-click within the home directory and select **New Folder**.
- Name the folder (for example, models).
- Double-click the new models folder to enter it.

6. Upload your model files to the current folder:
- Using JupyterLab:
- Click the **Upload** button in the file browser toolbar.
- In the file selection dialog, navigate to and select the model files from your local computer. Click **Open**.
- Wait for the upload to complete.
- Using code-server:
- Drag the model files directly from your local file explorer and drop them into the file browser pane in the target folder within code-server.
- Wait for the upload process to complete.

#### Verification
Confirm that your files appear in the file browser at the path where you uploaded them.


### Deploy Inference Service
```yaml
kind: InferenceService
apiVersion: serving.kserve.io/v1beta1
metadata:
annotations:
aml-model-repo: Qwen2.5-0.5B-Instruct # [!code callout]
aml-pipeline-tag: text-generation
serving.kserve.io/deploymentMode: Standard
labels:
aml.cpaas.io/runtime-type: vllm # [!code callout]
name: pvc-demo-1
namespace: demo-space
spec:
predictor:
maxReplicas: 1
minReplicas: 1
model:
modelFormat:
name: transformers
protocolVersion: v2
resources:
limits:
cpu: '2'
ephemeral-storage: 10Gi
memory: 8Gi
requests:
cpu: '2'
memory: 4Gi
runtime: aml-vllm-0.11.2-cpu # [!code callout]
storageUri: pvc://model-pvc # [!code callout]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd docs/en/model_inference/model_management/functions && wc -l model_storage.mdx

Repository: alauda/aml-docs

Length of output: 80


🏁 Script executed:

cd docs/en/model_inference/model_management/functions && sed -n '260,285p' model_storage.mdx

Repository: alauda/aml-docs

Length of output: 1030


🌐 Web query:

KServe PVC storageUri format URI subpath namespace requirements

💡 Result:

KServe storageUri for PVC: format + subpath

Use the pvc:// scheme with PVC name followed by an optional path inside the PVC:

  • Format: pvc://<pvc-name>/<model-path-within-pvc>
  • Examples:
    • Single file model: pvc://task-pv-claim/model.joblib (KServe example) [1]
    • General form: pvc://<pvc_name>/<model_path> [2]

<model-path-within-pvc> is effectively a subpath inside the PVC (for example, a file like model.joblib or a directory containing model artifacts). [1]

Namespace requirements

  • You cannot reference a PVC in another namespace via storageUri (there is no namespace field in the pvc:// URI).
  • The PVC must exist in the same Kubernetes namespace as the serving Pod / InferenceService, because PVCs are namespaced and must be in the same namespace as the Pod that uses them. [3]

Sources: KServe PVC storage guide examples [1], Open Data Hub (KServe) serving docs [2], Kubernetes persistent volume docs (claims + namespaces) [3].


Consider clarifying the optional model path in the PVC URI.
The format is pvc://<pvc-name>/<model-path-within-pvc>. The example pvc://model-pvc is valid for the root of the PVC, but consider adding a note that specific model paths (e.g., pvc://model-pvc/models/my-model) can be used. Namespace does not apply in the URI—the PVC must exist in the same Kubernetes namespace as the InferenceService.

🤖 Prompt for AI Agents
In `@docs/en/model_inference/model_management/functions/model_storage.mdx` at line
272, Update the storageUri example and add a short note clarifying the optional
model path and namespace behavior: state that the PVC URI format is
pvc://<pvc-name>/<model-path-within-pvc>, that the example storageUri:
pvc://model-pvc refers to the PVC root, and show an example with a specific path
(e.g., pvc://model-pvc/models/my-model); also add a sentence that the PVC must
exist in the same Kubernetes namespace as the InferenceService (namespace does
not apply in the URI).

securityContext:
seccompProfile:
type: RuntimeDefault
```

<Callouts>
1. Replace `Qwen2.5-0.5B-Instruct` with your actual model name.
2. `aml.cpaas.io/runtime-type: vllm` specifies the code runtime type. For more information about custom inference runtimes, see [Extend Inference Runtimes](../../../../inference_service/how_to/custom_inference_runtime.mdx).
3. Replace `aml-vllm-0.11.2-cpu` with the runtime name that is already installed in your platform (corresponding to a ClusterServingRuntime CRD instance).
4. `storageUri: pvc://model-pvc` specifies the PVC name where the model is stored.
</Callouts>