simplyblock · noctarius · Nov 5, 2025 · Oct 21, 2025 · Oct 24, 2025 · Oct 24, 2025
diff --git a/docs/architecture/concepts/logical-volumes.md b/docs/architecture/concepts/logical-volumes.md
@@ -19,5 +19,19 @@ Key characteristics of Logical Volumes include:
 - **High Performance:** Simplyblock’s architecture ensures low-latency access to LVs, making them suitable for demanding
   workloads.
 - **Fault Tolerance:** Data is distributed across multiple nodes to prevent data loss and improve reliability.
-- **Integration with Kubernetes:** LVs can be used as persistent storage for Kubernetes workloads, enabling seamless
-  stateful application management.
+
+Two basic types of logical volumes are supported by simplyblock:
+
+- **NVMe-oF Subsystems**: Each logical volume is backed by a separate set of queue pairs. By default, each subsystem 
+  provides three queue parts and one network connection.
+
+  Volumes show up in Linux using `lsblk` as `/dev/nvme0n2`, `/dev/nvme1n1`, `/dev/nvmeXn1`, ...
+
+- **NVMe-oF Namespaces**: Each logical volume is backed by an NVMe namespace. A namespace is a feature similar to a
+  logical partition of a drive, although it is defined on the NVMe level (device or target). Up to 32 namespaces share
+  a single NVMe subsystem and its queue pairs and connections.
+
+  This is a more resource-efficient, but performance-limited, version of an individual volume. It is useful, if many,
+  small volumes are required. Both methods can be combined in a single cluster.
+
+  Volumes show up in Linux using `lsblk` as `/dev/nvme0n1`, `/dev/nvme0n2`, `/dev/nvme0nX`, ...
diff --git a/docs/architecture/index.md b/docs/architecture/index.md
@@ -4,12 +4,12 @@ weight: 10100
 ---
 
 Simplyblock is a cloud-native, software-defined storage platform designed for high performance, scalability, and
-resilience. It provides NVMe over TCP (NVMe/TCP) block storage, enabling efficient data access across distributed
-environments. Understanding the architecture, key concepts, and common terminology is essential for effectively
-deploying and managing simplyblock in various infrastructure setups, including Kubernetes clusters, virtualized
-environments, and bare-metal deployments. This documentation provides a comprehensive overview of simplyblock’s
-internal architecture, the components that power it, and the best practices for integrating it into your storage
-infrastructure.
+resilience. It provides NVMe over TCP (NVMe/TCP) and NVMe over RDMA (ROCEv2) block storage, enabling efficient data
+access across distributed environments. Understanding the architecture, key concepts, and common terminology is
+essential for effectively deploying and managing simplyblock in various infrastructure setups, including Kubernetes
+clusters, virtualized environments, and bare-metal deployments. This documentation provides a comprehensive overview
+of simplyblock’s internal architecture, the components that power it, and the best practices for integrating it into
+your storage infrastructure.
 
 This section covers several critical topics, including the architecture of simplyblock, core concepts such as Logical
 Volumes (LVs), Storage Nodes, and Management Nodes, as well as Quality of Service (QoS) mechanisms and redundancy

diff --git a/docs/architecture/simplyblock-architecture.md b/docs/architecture/simplyblock-architecture.md
@@ -7,7 +7,7 @@ Simplyblock is a cloud-native, distributed block storage platform designed to de
 resilient storage through a software-defined architecture. Centered around NVMe-over-Fabrics (NVMe-oF), simplyblock
 separates compute and storage to enable scale-out elasticity, high availability, and low-latency operations in modern,
 containerized environments. The architecture is purpose-built to support Kubernetes-native deployments with seamless
-integration, but supports virtual and even physical machines as clients as well.
+integration but supports virtual and even physical machines as clients as well.
 
 ## Control Plane
 
@@ -54,9 +54,12 @@ environments, simplyblock requires at least three management nodes for high avai
 a set of replicated, stateful services.
 
 For internal state storage, the control plane uses ([FoundationDB](https://www.foundationdb.org/){:target="_blank" rel="noopener"}) as
-its key-value store. FoundationDB, by itself, operates in a replicated high-available cluster across all management
+its key-value store. FoundationDB, by itself, operates in a replicated highly-available cluster across all management
 nodes.
 
+Within Kubernetes deployments, the control plane can now also be deployed alongside the storage nodes on the same k8s
+workers. It will, however, run in separate pods.
+
 ## Storage Plane
 
 The storage plane consists of distributed storage nodes that run on Linux-based systems and provide logical volumes (

diff --git a/docs/architecture/storage-performance-and-qos.md b/docs/architecture/storage-performance-and-qos.md
@@ -0,0 +1,111 @@
+---
+title: "Performance and QoS"
+weight: 20100
+---
+
+## Storage Performance Indicators
+
+Storage performance can be categorized by latency (the aggregate response time of an IO request from the host to the
+storage system) and throughput. Throughput can be broken down into random IOPS throughput and sequential throughput.
+
+IOPS and sequential throughput must be measured relative to capacity (i.e., IOPS per TB).
+
+Latency and IOPS throughput depend heavily on the IO operation (read, write, unmap) and the IO size (4K, 8K, 16K,
+32K, ...). For comparability, it is typically tested with a 4K IO size, but tests with 8K to 128K are standard too.
+
+Latency is strongly influenced by the overall load on the overall storage system. If there is intense IO pressure,
+queues build up and response times go up. This is no different from a traffic jam on the highway or a queue at the
+airline counter. Therefore, to compare latency results, it must be measured under a fixed system load (amount of
+parallel IO, its size, and IO type mix).
+
+!!! Important
+    For latency, consistency matters. High latency variability, especially in the tail, can severely impact workloads.
+    Therefore, 99th percentile latency may be more important than the average or median.
+
+## Challenges with Hyper-Converged and Software-Defined Storage
+
+Unequal load distribution across cluster nodes, and the dynamics of specific nodes under Linux or Windows (dynamic
+multithreading, network bandwidth fluctuations, etc.), create significant challenges for consistent, high storage
+performance in such an environment.
+
+Mixed IO patterns increase these challenges from different workloads.
+
+This can cause substantial variability in latency, IOPS throughput, and high-tail latency, with a negative impact on
+workloads.
+
+## Simplyblock: How We Ensure Ultra-Low Latency In The 99th Percentile
+
+Simplyblock exhibits a range of architectural characteristics and features to guarantee consistently low latency and
+IOPS in both disaggregated and hyper-converged environments.
+
+### Pseudo-Randomized, Distributed Data Placement With Fast Re-Balancing
+
+Simplyblock is a fully distributed solution. Back-storage is balanced across all nodes in the cluster on a very granular
+level. Relative to their capacity and performance, each device and node in the cluster receives a similar amount and
+size of IO. This feature ensures an entirely equal distribution of load across the network, compute, and NVMe drives.
+
+In case of drive or node failures, distributed rebalancing occurs to reach the fully balanced state as quickly as
+possible. When adding drives and nodes, performance increases in a **linear manner**. This mechanism avoids local
+overload and keeps latency and IOPS throughput consistent across the cluster, independent of which node is accessed.
+
+### Built End-To-End With And For NVMe
+
+Storage access is entirely based on NVMe (local back-storage) and NVMe over Fabric (hosts to storage nodes and storage
+nodes to storage nodes). This protocol is inherently asynchronous and supports highly parallel processing, eliminating
+bottlenecks specific to mixed IO patterns on other protocols (such as iSCSI) and ensuring consistently low latency.
+
+### Support for ROCEv2
+
+Simplyblock also supports NVMe over RDMA (ROCEv2). RDMA, as a transport layer, offers significant latency and tail
+latency advantages over TCP. Today, RDMA can be used in most data center environments because it requires only specific
+hardware features from NICs, which are available across a broad range of models. It runs over UDP/IP and, as such, does
+not require any changes to the networking.
+
+### Full Core-Isolation And NUMA Awareness
+
+Simplyblock implements full CPU core isolation and NUMA socket affinity. Simplyblock’s storage nodes are auto-deployed
+per NUMA socket and utilize only socket-specific resources, meaning compute, memory, network interfaces, and NVMe.
+
+All CPU cores assigned to simplyblock are isolated from the operating system (user-space compute and IRQ handling), and
+internal threads are pinned to cores. This avoids any scheduling-induced delays or variability in storage processing.
+
+### User-Space, Zero-Copy Framework (Kockless and Asynchronous)
+
+Simplyblock uses a user-space framework ([SPDK](https://spdk.io/){:target="_blank" rel="noopener"}). SPDK implemented a
+zero-copy model across the entire storage processing chain. This includes the data plane, the Kinux vfio driver, and the
+entirely non-locking, asynchronous DPDK threading model. It enables avoiding Linux p-threads and any inter-thread
+synchronization, providing much higher latency predictability and a lower baseline latency.
+
+### Advanced QoS (Quality of Service)
+
+Simplyblock implements two independent, critical QoS mechanisms.
+
+#### Volume and Pool-Level Caps
+
+A cap, such as an IOPS, throughput limit, or a combination of both, can be set on an individual volume or an entire pool
+within the cluster. Through this limit, general-purpose volumes can be pooled and limited in their total IOPS or
+throughput to avoid noisy-neighbor effects and protect more critical workloads.
+
+#### QoS Service Classes
+
+On each cluster, up to 7 service classes can be defined (class 0 is the default). For each class, cluster performance (a
+combination of IOPS and throughput) can be allocated in relative terms (e.g., 20%) for performance guarantees.
+
+General-purpose volumes can be allocated in the default class, while more critical workloads can be split across other
+service classes. If other classes do not use up their quotas, the default class can still allocate all available
+resources.
+
+#### Why QoS Service Classes are Critical
+
+Why is a limit not sufficient? Imagine a heavily mixed workload in the cluster. Some workloads are read-intensive, while
+others are write-intensive. Some workloads require a lot of small random IO, while others read and write large
+sequential IO. There is no absolute number of IOPS or throughput a cluster can provide, considering the dynamics of
+workloads.
+
+Therefore, using absolute limits on one pool of volumes is effective for protecting others from spillover effects and
+undesired behavior. Still, it does not guarantee performance for a particular class of volumes.
+
+Service classes provide a much better degree of isolation under the consideration of dynamic workloads. As long as you
+do not overload a particular service class, the general IO pressure on the cluster will not matter for the performance
+of volumes in that class.
+
diff --git a/docs/architecture/what-is-simplyblock.md b/docs/architecture/what-is-simplyblock.md
@@ -11,7 +11,24 @@ Storage Interface (CSI) and ProxMox drivers.
 
 - **Environment Agnostic:** Simplyblock operates seamlessly across major cloud providers, regional, and specialized
   providers, bare-metal and virtual provisioners, and private clouds, including both virtualized and bare-metal
-- Kubernetes environments.
+  Kubernetes environments.
+
+- **NVMe-Optimized:** Simplyblock is built from scratch around NVMe. All internal and external storage access is
+  entirely based on NVMe and NVMe over Fabric (TCP, RDMA). This includes local back-storage on storage nodes,
+  host-to-cluster, and node-to-node traffic. Together with the user-space data plane, distributed data placement, and
+  advanced quality of service (QoS) and other characteristics, this makes simplyblock the storage platform with the most
+  advanced performance guarantees in hyperconverged solutions available today.
+
+- **User-Space Data Plane:** Simplyblock data plane is built entirely in user-space with an interrupt-free, lockless,
+  zero-copy architecture with thread-to-core pinning. The hot data path entirely avoids Linux kernel involvement, data
+  copies, dynamic thread scheduling, and inter-thread synchronization. Its deployment is fully numa-node-aware.
+
+- **Advanced QoS:** Simplyblock provides not only IOPS or throughput-based caps, but also true QoS service classes,
+  effectively isolating IO traffic.
+
+- **Distributed Data Placement:** Simplyblock's advanced data placement, which is based on small, fixed-size data
+  chunks, ensures a perfectly balanced utilization of storage, compute, and network bandwidth, avoiding any performance
+  bottlenecks local to specific nodes. This provides almost linear performance scalability for the cluster.
 
 - **Containerized Architecture:** The solution comprises:
     - *Storage Nodes:* Container stacks delivering distributed data services via NVMe over Fabrics (NVMe over TCP),

diff --git a/docs/deployments/baremetal/index.md b/docs/deployments/baremetal/index.md
@@ -3,7 +3,8 @@ title: "Plain Linux Initiators"
 weight: 20200
 ---
 
-Simplyblock storage can be attached over the network to Linux hosts which are not running Kubernetes or Proxmox. 
+Simplyblock storage can be attached over the network to Linux hosts which are not running Kubernetes, Proxmox or
+OpenStack.
 
 While no simplyblock components must be installed on these hosts, some OS-level configuration steps are required.
 Those manual steps are typically taken care of by the CSI driver or Proxmox integration.
@@ -25,7 +26,9 @@ volumes.
         sudo apt install -y nvme-cli
         ```
 
-### Load the NVMe over Fabrics Kernel Modules 
+### Load the NVMe over Fabrics Kernel Modules
+
+For NVMe over TCP and NVMe over RoCE:
 
 {% include 'prepare-nvme-tcp.md' %}
 
@@ -57,13 +60,28 @@ To create a new logical volume, the following command can be run on any control
   --max-rw-iops <IOPS> \
   --max-r-mbytes <THROUGHPUT> \
   --max-w-mbytes <THROUGHPUT> \
+  --ndcs <DATA CHUNKS IN STRIPE> \
+  --npcs <PARITY CHUNKS IN STRIPE>
+  --fabric {tcp, rdma}
+  --lvol-priority-class <1-6>
   <VOLUME_NAME> \
   <VOLUME_SIZE> \
   <POOL_NAME>
 ```
 
+!!! info
+    The parameters `ndcs` and `npcs` define the erasure-coding schema (e.g., `--ndcs=4 --npcs=2`). The settings are
+    optional. If not specified, the cluster default is chosen. Valid for `ndcs` are 1, 2, and 4, and for `npcs` 0,1,
+    and 2. However, it must be considered that the number of cluster nodes must be equal to or larger than (`ndcs` +
+    `npcs`).
+
+    The parameter `--fabric` defines the fabric by which the volume is connected to the cluster. It is optional and the
+    default is `tcp`. The fabric type `rdma` can only be chosen for hosts with an RDMA-capable NIC and for clusters that
+    support RDMA. A priority class is optional as well and can be selected only if the cluster defines it. A cluster can
+    define 0-6 priority classes. The default is 0.
+
 ```plain title="Example of creating a logical volume"
-{{ cliname }} volume add lvol01 1000G test  
+{{ cliname }} volume add --ndcs 2 --ndcs 1 --fabric tcp lvol01 1000G test  
 ```
 
 In this example, a logical volume with the name `lvol01` and 1TB of thinly provisioned capacity is created in the pool

diff --git a/docs/deployments/cluster-deployment-options.md b/docs/deployments/cluster-deployment-options.md
@@ -0,0 +1,59 @@
+The following options can be set when creating a cluster. This applies to both plain linux and kubernetes deployments.
+Most cannot be changed later on, so careful planning is recommended.
+
+### ```--enable-node-affinity```
+
+As long as a node is not full (out of capacity), the first chunk 
+of data is always stored on the local node (the node to which the volume is attached). 
+This reduces network traffic and latency - accelerating particularly the read - but may lead to an
+inequal distribution of capacity within the cluster. Generally, using node affinity accelerates
+reads, but leads to higher variability in performance across nodes in the cluster.
+It is recommended on shared networks and networks below 100gb/s. 
+
+### ```--data-chunks-per-stripe, --parity-chunks-per-stripe```
+
+Those two parameters together make up the default erasure coding schema of the node (e.g. 1+1, 2+2, 4+2). Starting from R25.10, it is also
+possible to set individual schemas per volume, but this feature is still in alpha-stage.
+
+### ```--cap-warn, --cap-crit```
+
+Warning and critical limits for overall cluster utilization. The warning 
+limit will just cause issuance of warnings in the event log if exceeded, the "critical" limit will 
+place the cluster into read-only mode. For large clusters, 99% of "critical" limit is ok, for small
+clusters (less than 50TB) better use 97%. 
+
+### ```--prov-cap-warn, --prov-cap-crit```
+
+Warning and critical limits for over-provisioning. Exceeding
+these limits will cause entries in the cluster log. If the critical limit is exceeded, 
+new volumes cannot be provisioned and volumes cannot be enlarged. A limit of 500% is typical.
+
+### ```--log-del-interval```
+
+Number of days by which logs are retained. Log storage can grow significantly and it is recommended to keep logs for not longer than one week.
+
+### ```--metrics-retention-period```
+
+Number of days by which the io statistics and other metrics are retained. The amount of data per day is significant, typically limit to a few days or a week.
+
+### ```--contact-point```
+
+This is a webhook endpoint for alerting (critical events such as storage nodes becoming unreachable)
+
+### ```--fabric```
+
+Choose tcp, rdma or both. If both fabrics are chosen, volumes can connect to the cluster
+using both options (defined per volume or storage class), but the cluster internally uses rdma.
+
+### ```--qpair-count```
+
+The default amount of queue pairs (sockets) per volume for an initiator (host) to connect to the 
+target (server). More queue pairs per volume increase concurrency and volume performance, but require more
+server resources (ram, cpu) and thus limit the total amount of volumes per storage node. The default is 3. 
+If you need few, very performant volumes, increase the amount, if you need a large amount of less performant 
+volumes decrease it. More than 12 parallel connections have limited impact on overall performance. Also, the 
+host requires at least one core per queue pair.
+
+### ```--name```
+
+A human-readable name for the cluster
diff --git a/docs/deployments/deployment-preparation/erasure-coding-scheme.md b/docs/deployments/deployment-preparation/erasure-coding-scheme.md
@@ -10,6 +10,11 @@ trade-offs between redundancy and storage utilization will help determine the be
 have been performance-optimized by specialized algorithms. There is, however, a remaining capacity-to-performance
 trade-off.
 
+!!! Info
+    Starting from 25.10.1, it is possible to select alternative erasure coding schemas per volume. However, this feature
+    is still experimental (technical preview) and not recommended for production. A cluster must provide sufficient
+    nodes for the largest schema used in any of the volumes (e.g., 4+2: min. 6 nodes, recommended 7 nodes).
+
 ## Erasure Coding Schemes
 
 Erasure coding (EC) is a **data protection mechanism** that distributes data and parity across multiple storage nodes,