Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c118d77
Prometheus hostpath (#761)
geoffrey1330 Nov 19, 2025
5f1700b
use max_size instead as hugepage memory when set (#754)
geoffrey1330 Nov 23, 2025
222d25e
Update storage_init_job.yaml.j2
geoffrey1330 Nov 23, 2025
e0f08f9
Update storage_init_job.yaml.j2
geoffrey1330 Nov 23, 2025
00cc359
Update storage_init_job.yaml.j2
geoffrey1330 Nov 23, 2025
3c76758
removed nsenter
geoffrey1330 Nov 23, 2025
a1a884a
fixed Syntax error: end of file unexpected
geoffrey1330 Nov 23, 2025
848c297
run systemctl ram mount disk on host
geoffrey1330 Nov 23, 2025
711f749
updated core Isolation
geoffrey1330 Nov 24, 2025
b2205d5
inherit from worker pool
geoffrey1330 Nov 24, 2025
bf92ab9
schedule admin control replica pod on the different worker
geoffrey1330 Nov 24, 2025
87a103c
check and compare hugepage before and after apply_config
geoffrey1330 Nov 24, 2025
43fc6eb
increased node add to 16 retry
geoffrey1330 Nov 24, 2025
1252f4e
run upto 2 replicas of the ingress controller across workers
geoffrey1330 Nov 24, 2025
11f7011
support 2 fdb coordinator failure
geoffrey1330 Nov 24, 2025
acf3cf5
sleep core isolation job for 5mins
geoffrey1330 Nov 25, 2025
1169fbe
use emptyDir memory medium as socket directory (#783)
geoffrey1330 Nov 27, 2025
4e19e3d
added graylog env GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE (#782)
geoffrey1330 Nov 27, 2025
169da21
addd endpoint bind_device_to_nvme in kubernetes (#784)
geoffrey1330 Nov 28, 2025
b384f13
remove fdb customParameters
geoffrey1330 Dec 11, 2025
8885e42
removed hostpath capacity
geoffrey1330 Dec 11, 2025
34ceddc
Update mongodb.yaml
geoffrey1330 Jan 13, 2026
c06e418
Update app_k8s.yaml
geoffrey1330 Jan 13, 2026
4b3900c
added check for spdk_container is running
geoffrey1330 Jan 29, 2026
9fd6450
removed helm chart dependency
geoffrey1330 Jan 29, 2026
31c2aa1
Revert "removed helm chart dependency"
geoffrey1330 Jan 29, 2026
a078070
Revert "added check for spdk_container is running"
geoffrey1330 Jan 29, 2026
d3a261d
Increase total sys HP memory with a buffer .5G for each sn node and a…
wmousa Jan 14, 2026
d6a94c6
Change hugepages memory variable from MEM_GEGA to MEM_MEGA
wmousa Jan 16, 2026
9125dd9
Change hugepages memory variable from MEM_GEGA to MEM_MEGA2
wmousa Jan 16, 2026
47df5bc
Add fix to p2p to allow passing non-exist pci
wmousa Jan 23, 2026
f2e3091
Add option --nvme-names to select nvme devices by their namespace nam…
wmousa Jan 30, 2026
b032ba9
bind device to spdk before formatting it
wmousa Jan 30, 2026
baf5691
Fix type issue
wmousa Jan 30, 2026
6a1f775
Make LVOL_NVMF_PORT_START configurable via environment variable
Hamdy-khader Jan 29, 2026
2665358
Fix sn restart with new device
Hamdy-khader Jan 30, 2026
d6b988b
Fix storage node key error on port allow task
Hamdy-khader Jan 30, 2026
d785399
Handle sync delete errors in tasks_runner_sync_lvol_del.py and update…
Hamdy-khader Feb 2, 2026
21dc3b9
Update env_var
wmousa Feb 4, 2026
83ad759
fix
Hamdy-khader Feb 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions simplyblock_cli/cli-reference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,14 @@ commands:
required: false
type: str
default: ""
- name: "--nvme-names"
help: "Comma separated list of nvme namespace names like nvme0n1,nvme1n1..."
description: >
Comma separated list of nvme namespace names like nvme0n1,nvme1n1...
dest: nvme_names
required: false
type: str
default: ""
- name: "--force"
help: "Force format detected or passed nvme pci address to 4K and clean partitions"
dest: force
Expand Down Expand Up @@ -150,6 +158,11 @@ commands:
dest: partitions
type: int
default: 1
- name: "--format-4k"
help: "Force format nvme devices with 4K"
dest: format_4k
type: bool
action: store_true
- name: "--jm-percent"
help: "Number in percent to use for JM from each device"
dest: jm_percent
Expand Down
2 changes: 2 additions & 0 deletions simplyblock_cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ def init_storage_node__configure(self, subparser):
argument = subcommand.add_argument('--pci-blocked', help='Comma separated list of PCI addresses of Nvme devices to not use for storage devices', type=str, default='', dest='pci_blocked', required=False)
argument = subcommand.add_argument('--device-model', help='NVMe SSD model string, example: --model PM1628, --device-model and --size-range must be set together', type=str, default='', dest='device_model', required=False)
argument = subcommand.add_argument('--size-range', help='NVMe SSD device size range separated by -, can be X(m,g,t) or bytes as integer, example: --size-range 50G-1T or --size-range 1232345-67823987, --device-model and --size-range must be set together', type=str, default='', dest='size_range', required=False)
argument = subcommand.add_argument('--nvme-names', help='Comma separated list of nvme namespace names like nvme0n1,nvme1n1...', type=str, default='', dest='nvme_names', required=False)
argument = subcommand.add_argument('--force', help='Force format detected or passed nvme pci address to 4K and clean partitions', dest='force', action='store_true')

def init_storage_node__configure_upgrade(self, subparser):
Expand All @@ -114,6 +115,7 @@ def init_storage_node__add_node(self, subparser):
subcommand.add_argument('node_addr', help='Address of storage node api to add, like <node-ip>:5000', type=str)
subcommand.add_argument('ifname', help='Management interface name', type=str)
argument = subcommand.add_argument('--journal-partition', help='1: auto-create small partitions for journal on nvme devices. 0: use a separate (the smallest) nvme device of the node for journal. The journal needs a maximum of 3 percent of total available raw disk space.', type=int, default=1, dest='partitions')
argument = subcommand.add_argument('--format-4k', help='Force format nvme devices with 4K', dest='format_4k', action='store_true')
if self.developer_mode:
argument = subcommand.add_argument('--jm-percent', help='Number in percent to use for JM from each device', type=int, default=3, dest='jm_percent')
argument = subcommand.add_argument('--data-nics', help='Storage network interface names. currently one interface is supported.', type=str, dest='data_nics', nargs='+')
Expand Down
7 changes: 6 additions & 1 deletion simplyblock_cli/clibase.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,15 @@ def storage_node__configure(self, sub_command, args):
max_prov = utils.parse_size(max_size, assume_unit='G')
pci_allowed = []
pci_blocked = []
nvme_names = []
if args.pci_allowed:
pci_allowed = [str(x) for x in args.pci_allowed.split(',')]
if args.pci_blocked:
pci_blocked = [str(x) for x in args.pci_blocked.split(',')]
if (args.device_model and not args.size_range) or (not args.device_model and args.size_range):
self.parser.error("device_model and size_range must be set together")
if args.nvme_names:
nvme_names = [str(x) for x in args.nvme_names.split(',')]
use_pci_allowed = bool(args.pci_allowed)
use_pci_blocked = bool(args.pci_blocked)
use_model_range = bool(args.device_model and args.size_range)
Expand All @@ -122,7 +125,7 @@ def storage_node__configure(self, sub_command, args):
return storage_ops.generate_automated_deployment_config(
args.max_lvol, max_prov, sockets_to_use,args.nodes_per_socket,
pci_allowed, pci_blocked, force=args.force, device_model=args.device_model,
size_range=args.size_range, cores_percentage=cores_percentage)
size_range=args.size_range, cores_percentage=cores_percentage, nvme_names=nvme_names)

def storage_node__deploy_cleaner(self, sub_command, args):
storage_ops.deploy_cleaner()
Expand Down Expand Up @@ -150,6 +153,7 @@ def storage_node__add_node(self, sub_command, args):
enable_ha_jm = args.enable_ha_jm
namespace = args.namespace
ha_jm_count = args.ha_jm_count
format_4k = args.format_4k
try:
out = storage_ops.add_node(
cluster_id=cluster_id,
Expand All @@ -169,6 +173,7 @@ def storage_node__add_node(self, sub_command, args):
id_device_by_nqn=args.id_device_by_nqn,
partition_size=args.partition_size,
ha_jm_count=ha_jm_count,
format_4k=format_4k
)
except Exception as e:
print(e)
Expand Down
4 changes: 3 additions & 1 deletion simplyblock_core/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,9 @@ def get_config_var(name, default=None):

# ports ranges
RPC_PORT_RANGE_START = 8080
LVOL_NVMF_PORT_START = 9100
NODE_NVMF_PORT_START=9060
NODE_HUBLVOL_PORT_START=9030
FW_PORT_START = 50001
# todo(hamdy): make it configurable: sfam-2586
LVOL_NVMF_PORT_ENV = os.getenv("LVOL_NVMF_PORT_START", "")
LVOL_NVMF_PORT_START = int(LVOL_NVMF_PORT_ENV) if LVOL_NVMF_PORT_ENV else 9100
4 changes: 2 additions & 2 deletions simplyblock_core/controllers/lvol_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,7 +714,7 @@ def add_lvol_on_node(lvol, snode, is_primary=True):
return False, f"Failed to create listener for {lvol.get_id()}"

logger.info("Add BDev to subsystem")
ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid,f"{lvol.vuid:016X}")
ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, lvol.ns_id)
if not ret:
return False, "Failed to add bdev to subsystem"
lvol.ns_id = int(ret)
Expand Down Expand Up @@ -758,7 +758,7 @@ def recreate_lvol_on_node(lvol, snode, ha_inode_self=0, ana_state=None):

# if namespace_found is False:
logger.info("Add BDev to subsystem")
ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, f"{lvol.vuid:016X}")
ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, lvol.ns_id)
# if not ret:
# return False, "Failed to add bdev to subsystem"

Expand Down
2 changes: 1 addition & 1 deletion simplyblock_core/controllers/tasks_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,7 +322,7 @@ def add_new_device_mig_task(device_id):

def add_node_add_task(cluster_id, function_params):
return _add_task(JobSchedule.FN_NODE_ADD, cluster_id, "", "",
function_params=function_params, max_retry=11)
function_params=function_params, max_retry=16)


def get_active_node_tasks(cluster_id, node_id):
Expand Down
1 change: 1 addition & 0 deletions simplyblock_core/scripts/charts/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ dependencies:
- name: prometheus
version: "25.18.0"
repository: "https://prometheus-community.github.io/helm-charts"
condition: monitoring.enabled
- name: ingress-nginx
version: 4.10.1
repository: "https://kubernetes.github.io/ingress-nginx"
Expand Down
9 changes: 8 additions & 1 deletion simplyblock_core/scripts/charts/templates/app_k8s.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
name: simplyblock-admin-control
namespace: {{ .Release.Namespace }}
spec:
replicas: 1
replicas: 2
selector:
matchLabels:
app: simplyblock-admin-control
Expand All @@ -21,6 +21,13 @@ spec:
serviceAccountName: simplyblock-sa
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: simplyblock-admin-control
topologyKey: kubernetes.io/hostname
containers:
- name: simplyblock-control
image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}"
Expand Down
217 changes: 217 additions & 0 deletions simplyblock_core/scripts/charts/templates/csi-hostpath-controller.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: csi-hostpathplugin-sa
namespace: {{ .Release.Namespace }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: csi-hostpathplugin
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: [""]
resources: ["persistentvolumeclaims/status"]
verbs: ["get", "update", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["volumeattachments"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["csinodes"]
verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources: ["csistoragecapacities"]
verbs: ["get", "list", "watch", "create", "update", "delete"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch", "update", "get", "list", "watch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: csi-hostpathplugin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: csi-hostpathplugin
subjects:
- kind: ServiceAccount
name: csi-hostpathplugin-sa
namespace: {{ .Release.Namespace }}
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: csi-hostpathplugin
labels:
app.kubernetes.io/instance: hostpath.csi.k8s.io
app.kubernetes.io/part-of: csi-driver-host-path
app.kubernetes.io/name: csi-hostpathplugin
app.kubernetes.io/component: plugin
spec:
serviceName: "csi-hostpathplugin"
# One replica only:
# Host path driver only works when everything runs
# on a single node.
replicas: 1
selector:
matchLabels:
app.kubernetes.io/instance: hostpath.csi.k8s.io
app.kubernetes.io/part-of: csi-driver-host-path
app.kubernetes.io/name: csi-hostpathplugin
app.kubernetes.io/component: plugin
template:
metadata:
labels:
app.kubernetes.io/instance: hostpath.csi.k8s.io
app.kubernetes.io/part-of: csi-driver-host-path
app.kubernetes.io/name: csi-hostpathplugin
app.kubernetes.io/component: plugin
spec:
serviceAccountName: csi-hostpathplugin-sa
containers:
- name: hostpath
image: registry.k8s.io/sig-storage/hostpathplugin:v1.17.0
args:
- "--drivername=hostpath.csi.k8s.io"
- "--v=5"
- "--endpoint=$(CSI_ENDPOINT)"
- "--nodeid=$(KUBE_NODE_NAME)"
# end hostpath args
env:
- name: CSI_ENDPOINT
value: unix:///csi/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
securityContext:
privileged: true
ports:
- containerPort: 9898
name: healthz
protocol: TCP
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 2
volumeMounts:
- mountPath: /csi
name: socket-dir
- mountPath: /var/lib/kubelet/pods
mountPropagation: Bidirectional
name: mountpoint-dir
- mountPath: /var/lib/kubelet/plugins
mountPropagation: Bidirectional
name: plugins-dir
- mountPath: /csi-data-dir
name: csi-data-dir
- mountPath: /dev
name: dev-dir

- name: liveness-probe
volumeMounts:
- mountPath: /csi
name: socket-dir
image: registry.k8s.io/sig-storage/livenessprobe:v2.17.0
args:
- --csi-address=/csi/csi.sock
- --health-port=9898

- name: csi-provisioner
image: registry.k8s.io/sig-storage/csi-provisioner:v6.0.0
args:
- -v=5
- --csi-address=/csi/csi.sock
- --feature-gates=Topology=true
- --enable-capacity
- --capacity-ownerref-level=0 # pod is owner
- --node-deployment=true
- --strict-topology=true
- --immediate-topology=false
- --worker-threads=5
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
# end csi-provisioner args
securityContext:
# This is necessary only for systems with SELinux, where
# non-privileged sidecar containers cannot access unix domain socket
# created by privileged CSI driver container.
privileged: true
volumeMounts:
- mountPath: /csi
name: socket-dir

- name: csi-resizer
image: registry.k8s.io/sig-storage/csi-resizer:v2.0.0
args:
- -v=5
- -csi-address=/csi/csi.sock
securityContext:
# This is necessary only for systems with SELinux, where
# non-privileged sidecar containers cannot access unix domain socket
# created by privileged CSI driver container.
privileged: true
volumeMounts:
- mountPath: /csi
name: socket-dir

volumes:
- hostPath:
path: /var/lib/kubelet/plugins/csi-hostpath
type: DirectoryOrCreate
name: socket-dir
- hostPath:
path: /var/lib/kubelet/pods
type: DirectoryOrCreate
name: mountpoint-dir
- hostPath:
path: /var/lib/kubelet/plugins_registry
type: Directory
name: registration-dir
- hostPath:
path: /var/lib/kubelet/plugins
type: Directory
name: plugins-dir
- hostPath:
# 'path' is where PV data is persisted on host.
# using /tmp is also possible while the PVs will not available after plugin container recreation or host reboot
path: /var/lib/csi-hostpath-data/
type: DirectoryOrCreate
name: csi-data-dir
- hostPath:
path: /dev
type: Directory
name: dev-dir
# end csi volumes
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ spec:
podInfoOnMount: true
# No attacher needed.
attachRequired: false
storageCapacity: false
# Kubernetes may use fsGroup to change permissions and ownership
storageCapacity: true
# Kubernetes may use fsGroup to change permissions and ownership
# of the volume to match user requested fsGroup in the pod's SecurityPolicy
fsGroupPolicy: File

Loading
Loading