Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
79a7cb9
updates for 25.10.
schmidt-scaled Oct 21, 2025
0df7627
fix links
schmidt-scaled Oct 24, 2025
6b07da6
few updates
schmidt-scaled Oct 24, 2025
f35c484
few updates
schmidt-scaled Oct 24, 2025
6d5e949
few updates
schmidt-scaled Oct 24, 2025
245521a
few updates
schmidt-scaled Oct 24, 2025
8b609ca
few updates
schmidt-scaled Oct 24, 2025
4407301
few updates
schmidt-scaled Oct 24, 2025
ff980c2
few updates
schmidt-scaled Oct 24, 2025
c8bf547
Fixed logical-volume.md
noctarius Oct 24, 2025
18e8296
Fixed architecture/index.md
noctarius Oct 24, 2025
63cd257
Fixed simplyblock-architecture.md
noctarius Oct 24, 2025
f316ca5
Fixed storage-performance-and-qos.md
noctarius Oct 24, 2025
8ebdcfa
Fixed what-is-simplyblock.md
noctarius Oct 24, 2025
fd8e6f1
Fixed baremetal/index.md
noctarius Oct 24, 2025
95f7e4c
Fixed erasure-coding-scheme.md
noctarius Oct 24, 2025
b39bf24
Fixed system-requirements.md
noctarius Oct 24, 2025
b0e255b
Fixed install-on-linux/index.md
noctarius Oct 24, 2025
5287b8f
Fixed install-cp.md
noctarius Oct 24, 2025
e78882f
Fixed install-sp.md
noctarius Oct 24, 2025
2a8b313
Removed csi-features.md and merged the content into the existing stor…
noctarius Oct 24, 2025
9747d22
Fixed kubernetes/index.md
noctarius Oct 24, 2025
cbf3aba
Fixed kubernetes/install-csi.md
noctarius Oct 24, 2025
a194072
Fixed k8s-control-plane.md
noctarius Oct 24, 2025
b1ea7af
Fixed k8s-storage-plane.md
noctarius Oct 24, 2025
48008f9
Fixed openstack/index.md
noctarius Oct 24, 2025
dc43e39
Fixed deployments/index.md
noctarius Oct 24, 2025
c32d004
Fixed terminology.md
noctarius Oct 24, 2025
a0be93e
few updates
schmidt-scaled Oct 24, 2025
2d298a5
Merge remote-tracking branch 'origin/R25.10' into R25.10
schmidt-scaled Oct 24, 2025
0b7e2b9
few updates
schmidt-scaled Oct 24, 2025
331a4e3
Fixed qos folder
noctarius Oct 24, 2025
6850518
Fixed cluster-upgrade.md
noctarius Oct 24, 2025
d61ad4c
Fixed prepare-nvme-tcp.md
noctarius Oct 24, 2025
7372aa4
updated with new storageclass param
geoffrey1330 Oct 24, 2025
6616af1
updated with new storageclass param
geoffrey1330 Oct 24, 2025
e680adc
added the topology change to docs
geoffrey1330 Oct 24, 2025
043573c
updated the storage node label
geoffrey1330 Oct 25, 2025
5bddb68
updated install control plane on kubernetes
geoffrey1330 Oct 25, 2025
6b7f352
added docs for openshift deployment
geoffrey1330 Oct 27, 2025
beb4c11
added docs for openshift deployment
geoffrey1330 Oct 27, 2025
df253fa
Updated documentation
noctarius Oct 27, 2025
5db458a
Create control-plane-network-table-k8s.md
schmidt-scaled Oct 30, 2025
79680cf
Update k8s-control-plane.md
schmidt-scaled Oct 30, 2025
9172bd7
Update k8s-storage-plane.md
schmidt-scaled Oct 30, 2025
b115bb9
Create storage-plane-network-table-k8s.md
schmidt-scaled Oct 30, 2025
5eaf0d1
Rename storage-plane-network-table-k8s.md to storage-plane-network-po…
schmidt-scaled Oct 30, 2025
a9543e6
Rename control-plane-network-table-k8s.md to control-plane-network-po…
schmidt-scaled Oct 30, 2025
b92c5cf
Quick changes
noctarius Nov 5, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions docs/architecture/concepts/logical-volumes.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,19 @@ Key characteristics of Logical Volumes include:
- **High Performance:** Simplyblock’s architecture ensures low-latency access to LVs, making them suitable for demanding
workloads.
- **Fault Tolerance:** Data is distributed across multiple nodes to prevent data loss and improve reliability.
- **Integration with Kubernetes:** LVs can be used as persistent storage for Kubernetes workloads, enabling seamless
stateful application management.

Two basic types of logical volumes are supported by simplyblock:

- **NVMe-oF Subsystems**: Each logical volume is backed by a separate set of queue pairs. By default, each subsystem
provides three queue parts and one network connection.

Volumes show up in Linux using `lsblk` as `/dev/nvme0n2`, `/dev/nvme1n1`, `/dev/nvmeXn1`, ...

- **NVMe-oF Namespaces**: Each logical volume is backed by an NVMe namespace. A namespace is a feature similar to a
logical partition of a drive, although it is defined on the NVMe level (device or target). Up to 32 namespaces share
a single NVMe subsystem and its queue pairs and connections.

This is a more resource-efficient, but performance-limited, version of an individual volume. It is useful, if many,
small volumes are required. Both methods can be combined in a single cluster.

Volumes show up in Linux using `lsblk` as `/dev/nvme0n1`, `/dev/nvme0n2`, `/dev/nvme0nX`, ...
12 changes: 6 additions & 6 deletions docs/architecture/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ weight: 10100
---

Simplyblock is a cloud-native, software-defined storage platform designed for high performance, scalability, and
resilience. It provides NVMe over TCP (NVMe/TCP) block storage, enabling efficient data access across distributed
environments. Understanding the architecture, key concepts, and common terminology is essential for effectively
deploying and managing simplyblock in various infrastructure setups, including Kubernetes clusters, virtualized
environments, and bare-metal deployments. This documentation provides a comprehensive overview of simplyblock’s
internal architecture, the components that power it, and the best practices for integrating it into your storage
infrastructure.
resilience. It provides NVMe over TCP (NVMe/TCP) and NVMe over RDMA (ROCEv2) block storage, enabling efficient data
access across distributed environments. Understanding the architecture, key concepts, and common terminology is
essential for effectively deploying and managing simplyblock in various infrastructure setups, including Kubernetes
clusters, virtualized environments, and bare-metal deployments. This documentation provides a comprehensive overview
of simplyblock’s internal architecture, the components that power it, and the best practices for integrating it into
your storage infrastructure.

This section covers several critical topics, including the architecture of simplyblock, core concepts such as Logical
Volumes (LVs), Storage Nodes, and Management Nodes, as well as Quality of Service (QoS) mechanisms and redundancy
Expand Down
7 changes: 5 additions & 2 deletions docs/architecture/simplyblock-architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Simplyblock is a cloud-native, distributed block storage platform designed to de
resilient storage through a software-defined architecture. Centered around NVMe-over-Fabrics (NVMe-oF), simplyblock
separates compute and storage to enable scale-out elasticity, high availability, and low-latency operations in modern,
containerized environments. The architecture is purpose-built to support Kubernetes-native deployments with seamless
integration, but supports virtual and even physical machines as clients as well.
integration but supports virtual and even physical machines as clients as well.

## Control Plane

Expand Down Expand Up @@ -54,9 +54,12 @@ environments, simplyblock requires at least three management nodes for high avai
a set of replicated, stateful services.

For internal state storage, the control plane uses ([FoundationDB](https://www.foundationdb.org/){:target="_blank" rel="noopener"}) as
its key-value store. FoundationDB, by itself, operates in a replicated high-available cluster across all management
its key-value store. FoundationDB, by itself, operates in a replicated highly-available cluster across all management
nodes.

Within Kubernetes deployments, the control plane can now also be deployed alongside the storage nodes on the same k8s
workers. It will, however, run in separate pods.

## Storage Plane

The storage plane consists of distributed storage nodes that run on Linux-based systems and provide logical volumes (
Expand Down
111 changes: 111 additions & 0 deletions docs/architecture/storage-performance-and-qos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: "Performance and QoS"
weight: 20100
---

## Storage Performance Indicators

Storage performance can be categorized by latency (the aggregate response time of an IO request from the host to the
storage system) and throughput. Throughput can be broken down into random IOPS throughput and sequential throughput.

IOPS and sequential throughput must be measured relative to capacity (i.e., IOPS per TB).

Latency and IOPS throughput depend heavily on the IO operation (read, write, unmap) and the IO size (4K, 8K, 16K,
32K, ...). For comparability, it is typically tested with a 4K IO size, but tests with 8K to 128K are standard too.

Latency is strongly influenced by the overall load on the overall storage system. If there is intense IO pressure,
queues build up and response times go up. This is no different from a traffic jam on the highway or a queue at the
airline counter. Therefore, to compare latency results, it must be measured under a fixed system load (amount of
parallel IO, its size, and IO type mix).

!!! Important
For latency, consistency matters. High latency variability, especially in the tail, can severely impact workloads.
Therefore, 99th percentile latency may be more important than the average or median.

## Challenges with Hyper-Converged and Software-Defined Storage

Unequal load distribution across cluster nodes, and the dynamics of specific nodes under Linux or Windows (dynamic
multithreading, network bandwidth fluctuations, etc.), create significant challenges for consistent, high storage
performance in such an environment.

Mixed IO patterns increase these challenges from different workloads.

This can cause substantial variability in latency, IOPS throughput, and high-tail latency, with a negative impact on
workloads.

## Simplyblock: How We Ensure Ultra-Low Latency In The 99th Percentile

Simplyblock exhibits a range of architectural characteristics and features to guarantee consistently low latency and
IOPS in both disaggregated and hyper-converged environments.

### Pseudo-Randomized, Distributed Data Placement With Fast Re-Balancing

Simplyblock is a fully distributed solution. Back-storage is balanced across all nodes in the cluster on a very granular
level. Relative to their capacity and performance, each device and node in the cluster receives a similar amount and
size of IO. This feature ensures an entirely equal distribution of load across the network, compute, and NVMe drives.

In case of drive or node failures, distributed rebalancing occurs to reach the fully balanced state as quickly as
possible. When adding drives and nodes, performance increases in a **linear manner**. This mechanism avoids local
overload and keeps latency and IOPS throughput consistent across the cluster, independent of which node is accessed.

### Built End-To-End With And For NVMe

Storage access is entirely based on NVMe (local back-storage) and NVMe over Fabric (hosts to storage nodes and storage
nodes to storage nodes). This protocol is inherently asynchronous and supports highly parallel processing, eliminating
bottlenecks specific to mixed IO patterns on other protocols (such as iSCSI) and ensuring consistently low latency.

### Support for ROCEv2

Simplyblock also supports NVMe over RDMA (ROCEv2). RDMA, as a transport layer, offers significant latency and tail
latency advantages over TCP. Today, RDMA can be used in most data center environments because it requires only specific
hardware features from NICs, which are available across a broad range of models. It runs over UDP/IP and, as such, does
not require any changes to the networking.

### Full Core-Isolation And NUMA Awareness

Simplyblock implements full CPU core isolation and NUMA socket affinity. Simplyblock’s storage nodes are auto-deployed
per NUMA socket and utilize only socket-specific resources, meaning compute, memory, network interfaces, and NVMe.

All CPU cores assigned to simplyblock are isolated from the operating system (user-space compute and IRQ handling), and
internal threads are pinned to cores. This avoids any scheduling-induced delays or variability in storage processing.

### User-Space, Zero-Copy Framework (Kockless and Asynchronous)

Simplyblock uses a user-space framework ([SPDK](https://spdk.io/){:target="_blank" rel="noopener"}). SPDK implemented a
zero-copy model across the entire storage processing chain. This includes the data plane, the Kinux vfio driver, and the
entirely non-locking, asynchronous DPDK threading model. It enables avoiding Linux p-threads and any inter-thread
synchronization, providing much higher latency predictability and a lower baseline latency.

### Advanced QoS (Quality of Service)

Simplyblock implements two independent, critical QoS mechanisms.

#### Volume and Pool-Level Caps

A cap, such as an IOPS, throughput limit, or a combination of both, can be set on an individual volume or an entire pool
within the cluster. Through this limit, general-purpose volumes can be pooled and limited in their total IOPS or
throughput to avoid noisy-neighbor effects and protect more critical workloads.

#### QoS Service Classes

On each cluster, up to 7 service classes can be defined (class 0 is the default). For each class, cluster performance (a
combination of IOPS and throughput) can be allocated in relative terms (e.g., 20%) for performance guarantees.

General-purpose volumes can be allocated in the default class, while more critical workloads can be split across other
service classes. If other classes do not use up their quotas, the default class can still allocate all available
resources.

#### Why QoS Service Classes are Critical

Why is a limit not sufficient? Imagine a heavily mixed workload in the cluster. Some workloads are read-intensive, while
others are write-intensive. Some workloads require a lot of small random IO, while others read and write large
sequential IO. There is no absolute number of IOPS or throughput a cluster can provide, considering the dynamics of
workloads.

Therefore, using absolute limits on one pool of volumes is effective for protecting others from spillover effects and
undesired behavior. Still, it does not guarantee performance for a particular class of volumes.

Service classes provide a much better degree of isolation under the consideration of dynamic workloads. As long as you
do not overload a particular service class, the general IO pressure on the cluster will not matter for the performance
of volumes in that class.

19 changes: 18 additions & 1 deletion docs/architecture/what-is-simplyblock.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,24 @@ Storage Interface (CSI) and ProxMox drivers.

- **Environment Agnostic:** Simplyblock operates seamlessly across major cloud providers, regional, and specialized
providers, bare-metal and virtual provisioners, and private clouds, including both virtualized and bare-metal
- Kubernetes environments.
Kubernetes environments.

- **NVMe-Optimized:** Simplyblock is built from scratch around NVMe. All internal and external storage access is
entirely based on NVMe and NVMe over Fabric (TCP, RDMA). This includes local back-storage on storage nodes,
host-to-cluster, and node-to-node traffic. Together with the user-space data plane, distributed data placement, and
advanced quality of service (QoS) and other characteristics, this makes simplyblock the storage platform with the most
advanced performance guarantees in hyperconverged solutions available today.

- **User-Space Data Plane:** Simplyblock data plane is built entirely in user-space with an interrupt-free, lockless,
zero-copy architecture with thread-to-core pinning. The hot data path entirely avoids Linux kernel involvement, data
copies, dynamic thread scheduling, and inter-thread synchronization. Its deployment is fully numa-node-aware.

- **Advanced QoS:** Simplyblock provides not only IOPS or throughput-based caps, but also true QoS service classes,
effectively isolating IO traffic.

- **Distributed Data Placement:** Simplyblock's advanced data placement, which is based on small, fixed-size data
chunks, ensures a perfectly balanced utilization of storage, compute, and network bandwidth, avoiding any performance
bottlenecks local to specific nodes. This provides almost linear performance scalability for the cluster.

- **Containerized Architecture:** The solution comprises:
- *Storage Nodes:* Container stacks delivering distributed data services via NVMe over Fabrics (NVMe over TCP),
Expand Down
24 changes: 21 additions & 3 deletions docs/deployments/baremetal/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ title: "Plain Linux Initiators"
weight: 20200
---

Simplyblock storage can be attached over the network to Linux hosts which are not running Kubernetes or Proxmox.
Simplyblock storage can be attached over the network to Linux hosts which are not running Kubernetes, Proxmox or
OpenStack.

While no simplyblock components must be installed on these hosts, some OS-level configuration steps are required.
Those manual steps are typically taken care of by the CSI driver or Proxmox integration.
Expand All @@ -25,7 +26,9 @@ volumes.
sudo apt install -y nvme-cli
```

### Load the NVMe over Fabrics Kernel Modules
### Load the NVMe over Fabrics Kernel Modules

For NVMe over TCP and NVMe over RoCE:

{% include 'prepare-nvme-tcp.md' %}

Expand Down Expand Up @@ -57,13 +60,28 @@ To create a new logical volume, the following command can be run on any control
--max-rw-iops <IOPS> \
--max-r-mbytes <THROUGHPUT> \
--max-w-mbytes <THROUGHPUT> \
--ndcs <DATA CHUNKS IN STRIPE> \
--npcs <PARITY CHUNKS IN STRIPE>
--fabric {tcp, rdma}
--lvol-priority-class <1-6>
<VOLUME_NAME> \
<VOLUME_SIZE> \
<POOL_NAME>
```

!!! info
The parameters `ndcs` and `npcs` define the erasure-coding schema (e.g., `--ndcs=4 --npcs=2`). The settings are
optional. If not specified, the cluster default is chosen. Valid for `ndcs` are 1, 2, and 4, and for `npcs` 0,1,
and 2. However, it must be considered that the number of cluster nodes must be equal to or larger than (`ndcs` +
`npcs`).

The parameter `--fabric` defines the fabric by which the volume is connected to the cluster. It is optional and the
default is `tcp`. The fabric type `rdma` can only be chosen for hosts with an RDMA-capable NIC and for clusters that
support RDMA. A priority class is optional as well and can be selected only if the cluster defines it. A cluster can
define 0-6 priority classes. The default is 0.

```plain title="Example of creating a logical volume"
{{ cliname }} volume add lvol01 1000G test
{{ cliname }} volume add --ndcs 2 --ndcs 1 --fabric tcp lvol01 1000G test
```

In this example, a logical volume with the name `lvol01` and 1TB of thinly provisioned capacity is created in the pool
Expand Down
59 changes: 59 additions & 0 deletions docs/deployments/cluster-deployment-options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
The following options can be set when creating a cluster. This applies to both plain linux and kubernetes deployments.
Most cannot be changed later on, so careful planning is recommended.

### ```--enable-node-affinity```

As long as a node is not full (out of capacity), the first chunk
of data is always stored on the local node (the node to which the volume is attached).
This reduces network traffic and latency - accelerating particularly the read - but may lead to an
inequal distribution of capacity within the cluster. Generally, using node affinity accelerates
reads, but leads to higher variability in performance across nodes in the cluster.
It is recommended on shared networks and networks below 100gb/s.

### ```--data-chunks-per-stripe, --parity-chunks-per-stripe```

Those two parameters together make up the default erasure coding schema of the node (e.g. 1+1, 2+2, 4+2). Starting from R25.10, it is also
possible to set individual schemas per volume, but this feature is still in alpha-stage.

### ```--cap-warn, --cap-crit```

Warning and critical limits for overall cluster utilization. The warning
limit will just cause issuance of warnings in the event log if exceeded, the "critical" limit will
place the cluster into read-only mode. For large clusters, 99% of "critical" limit is ok, for small
clusters (less than 50TB) better use 97%.

### ```--prov-cap-warn, --prov-cap-crit```

Warning and critical limits for over-provisioning. Exceeding
these limits will cause entries in the cluster log. If the critical limit is exceeded,
new volumes cannot be provisioned and volumes cannot be enlarged. A limit of 500% is typical.

### ```--log-del-interval```

Number of days by which logs are retained. Log storage can grow significantly and it is recommended to keep logs for not longer than one week.

### ```--metrics-retention-period```

Number of days by which the io statistics and other metrics are retained. The amount of data per day is significant, typically limit to a few days or a week.

### ```--contact-point```

This is a webhook endpoint for alerting (critical events such as storage nodes becoming unreachable)

### ```--fabric```

Choose tcp, rdma or both. If both fabrics are chosen, volumes can connect to the cluster
using both options (defined per volume or storage class), but the cluster internally uses rdma.

### ```--qpair-count```

The default amount of queue pairs (sockets) per volume for an initiator (host) to connect to the
target (server). More queue pairs per volume increase concurrency and volume performance, but require more
server resources (ram, cpu) and thus limit the total amount of volumes per storage node. The default is 3.
If you need few, very performant volumes, increase the amount, if you need a large amount of less performant
volumes decrease it. More than 12 parallel connections have limited impact on overall performance. Also, the
host requires at least one core per queue pair.

### ```--name```

A human-readable name for the cluster
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ trade-offs between redundancy and storage utilization will help determine the be
have been performance-optimized by specialized algorithms. There is, however, a remaining capacity-to-performance
trade-off.

!!! Info
Starting from 25.10.1, it is possible to select alternative erasure coding schemas per volume. However, this feature
is still experimental (technical preview) and not recommended for production. A cluster must provide sufficient
nodes for the largest schema used in any of the volumes (e.g., 4+2: min. 6 nodes, recommended 7 nodes).

## Erasure Coding Schemes

Erasure coding (EC) is a **data protection mechanism** that distributes data and parity across multiple storage nodes,
Expand Down
Loading