Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions llms.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Alauda Container Platform - Redis Operator Docs

> Documentation for the Alauda Container Platform Redis operator: architecture, installation, lifecycle, upgrades, and operational how-tos.

This index covers user-facing documentation under docs/en/ for managing
Redis instances on Alauda Container Platform: installation, architecture,
Kubernetes API references, function guides (creation, access, scaling,
monitoring, disaster recovery), how-to guides, and troubleshooting.


## Root

- [docs/en/architecture.mdx](docs/en/architecture.mdx): Compares Alauda Cache Service for Redis OSS's two deployment architectures: Sentinel Mode (master-replica replication with automatic failover monitored by Redis Sentinel) and Cluster Mode (horizontally scaled sharding via hash slots and consistent hashing). Includes a selection matrix covering horizontal scalability, dataset size beyond 8GB, multi-key transactions, and read-replica workloads to guide choosing between the two architectures.
- [docs/en/index.mdx](docs/en/index.mdx): Top-level navigation entry point for the English documentation set, rendering an auto-generated overview of child pages via the `<Overview>` component.
- [docs/en/installation.mdx](docs/en/installation.mdx): Step-by-step installation guide for deploying Alauda Cache Service for Redis OSS through the Web Console, including the optional Alauda Container Platform Data Services Essentials plugin (installed on the `global` cluster via Marketplace > Cluster Plugins) and the RDS Framework Operator (installed via OperatorHub with stable channel, Cluster install mode, and Manual upgrade strategy). Lists prerequisites such as obtaining the platform-compatible installation package and publishing it to target clusters.
- [docs/en/intro.mdx](docs/en/intro.mdx): Introduces Redis OSS as an in-memory data structure store and presents Alauda Cache Service for Redis OSS as a Kubernetes-native Operator that manages Redis lifecycles declaratively. Highlights key features: Standalone/Sentinel/Cluster modes, support for Redis 5.0/6.0/7.2, Redis ACL (6.0+), TLS encryption, NodePort and LoadBalancer access, IPv4/IPv6, online horizontal/vertical scaling, graceful upgrades, NodeSelector/Toleration/Affinity scheduling, rolling shard adjustments, zero-downtime upgrades, and hot ConfigMap-driven configuration updates.
- [docs/en/lifecycle_policy.mdx](docs/en/lifecycle_policy.mdx): Publishes the support timeline for Alauda Cache Service for Redis OSS releases (v4.1.x released 2025-08-22, end of support 2027-08-30) and the maintenance policy covering patch/minor release cadence, upstream Redis CVE and bug patch tracking, operator security updates, and upgrade assistance during the maintenance window.
- [docs/en/release_notes.mdx](docs/en/release_notes.mdx): Release notes for Alauda Cache Service for Redis OSS v4.1.0 and v4.0.0, with a compatibility matrix mapping product versions to Redis server versions (5.0.14, 6.0.20, 7.2.10/7.2.x) and ACP versions. v4.1.0 introduces alpha Redis disaster recovery (data sync, monitoring/alerts, failover, source management) and intra-cluster/inter-cluster access method switching; v4.0.0 decouples the component lifecycle from the platform, adds real-time logs on the details page, and supports LoadBalancer external access (with MetalLB recommended for address pools).
- [docs/en/upgrade.mdx](docs/en/upgrade.mdx): Upgrade guidance for Alauda Cache Service for Redis OSS covering semantic versioning compatibility (patch/minor backward-compatible, major may break), prerequisites (Ready instance status, resource headroom, backups), and a tested upgrade matrix mapping v4.0.x and v4.1.0 to Redis server versions, ACP versions, and supported Kubernetes versions (1.28-1.32). Documents minor/patch/major upgrade strategies, automatic vs. manual execution modes, and operational considerations such as downtime planning, rollback, and post-upgrade validation.

## apis

- [docs/en/apis/index.mdx](docs/en/apis/index.mdx): Top-level entry page for the API Reference section, rendering an auto-generated overview of API documentation subsections via the `<Overview>` component.
- [docs/en/apis/kubernetes_apis/index.mdx](docs/en/apis/kubernetes_apis/index.mdx): Index page for the Kubernetes APIs reference, providing a navigation overview of the Custom Resource Definitions (Redis, RedisUser, ActiveRedisConnection) exposed by the Alauda Cache Service for Redis OSS Operator.
- [docs/en/apis/kubernetes_apis/redis/activeredisconnection.mdx](docs/en/apis/kubernetes_apis/redis/activeredisconnection.mdx): API reference page that renders the schema for the `activeredisconnections.redis.middleware.alauda.io` CRD via the `<K8sCrd>` component; this CRD tracks live client connections to Redis instances managed by the Operator.
- [docs/en/apis/kubernetes_apis/redis/index.mdx](docs/en/apis/kubernetes_apis/redis/index.mdx): Index page for the Redis CRD group under `redis.middleware.alauda.io`, providing navigation to the Redis, RedisUser, and ActiveRedisConnection custom resource references.
- [docs/en/apis/kubernetes_apis/redis/redis.mdx](docs/en/apis/kubernetes_apis/redis/redis.mdx): API reference page that renders the schema for the primary `redis.middleware.alauda.io` Redis CRD via the `<K8sCrd>` component; this CRD declares Redis instances (Standalone/Sentinel/Cluster architectures, replica/shard topology, exporter, expose method, custom config, resources, version).
- [docs/en/apis/kubernetes_apis/redis/redisuser.mdx](docs/en/apis/kubernetes_apis/redis/redisuser.mdx): API reference page that renders the schema for the `redisusers.redis.middleware.alauda.io` CRD via the `<K8sCrd>` component; this CRD provisions Redis ACL users with declarative permission rules against managed Redis instances (Redis 6.0+).

## functions

- [docs/en/functions/10-create-instance.mdx](docs/en/functions/10-create-instance.mdx): Walks through creating Redis instances in Sentinel, Cluster, and Standalone architectures via kubectl manifests (`apiVersion: middleware.alauda.io/v1`, `kind: Redis`) and the Web Console. Includes sample YAML showing `arch`, `customConfig` (databases/hz/save/timeout), `exporter`, `replicas.sentinel`/`replicas.cluster.shard`, `sentinel.monitorConfig` (down-after-milliseconds, failover-timeout, parallel-syncs), `expose.type: NodePort`, and `affinityPolicy: AntiAffinityInSharding`. Documents Web Console fields for parameter templates, default user passwords, within-cluster vs. external access (LoadBalancer/NodePort), host-port specification, anti-affinity policies, node labels, and Pod tolerations (Equal/Exists with NoExecute eviction semantics), plus the `kubectl get redis` output columns (NAME, ARCH, VERSION, ACCESS, STATUS, MESSAGE, BUNDLE VERSION, AUTOUPGRADE, AGE) and their meanings.
- [docs/en/functions/15-delete-instance.mdx](docs/en/functions/15-delete-instance.mdx): Explains how deleting a Redis CR cascades to its Operator-managed StatefulSets, ConfigMaps, Secrets, and PersistentVolumeClaims. CLI workflow uses `kubectl delete redis` to keep persistent volumes, or patches the built-in `delete-pvc` finalizer before deletion to also remove PVCs. Web Console workflow uses the Actions menu on the instance details page with an optional `Delete PVC` checkbox and confirmation by typing the instance name.
- [docs/en/functions/20-user.mdx](docs/en/functions/20-user.mdx): Manages Redis authentication and access control via the `RedisUser` CRD, including listing users, rotating passwords stored in Kubernetes Secrets (base64-encoded `password` key), and applying ACL rules. Highlights the version split between Redis 5.0 (single-credential, restart required) and Redis 6.0+ ACLs with multi-user support, and documents predefined permission profiles (`NotDangerous`, `ReadWrite`, `ReadOnly`, `Administrator`) plus the reserved `operator` system account that must not be modified.
- [docs/en/functions/30-parameter.mdx](docs/en/functions/30-parameter.mdx): Covers Redis instance parameter configuration through built-in templates (RDB persistence, AOF persistence, Diskless cache) tailored to durability vs. performance trade-offs. Classifies parameters as hot-update, restart-update (e.g. `databases`, `io-threads`, `rename-command`), or non-modifiable (`bind`, `protected-mode`, `port`, `dir`), and shows patching `spec.customConfig` on the Redis CR and `spec.sentinel.monitorConfig` for sentinel tunables `down-after-milliseconds`, `failover-timeout`, and `parallel-syncs`.
- [docs/en/functions/40-accessmethod.mdx](docs/en/functions/40-accessmethod.mdx): Documents connection strategies for Alauda Cache Service for Redis OSS, distinguishing internal access (DNS-based or IP-based intra-cluster routing) from external access via NodePort or LoadBalancer. Explains why NodePort cannot be fronted by an additional load balancer and is unsuitable for multi-NIC nodes, and describes how Sentinel clients use the fixed master group name `mymaster` while Cluster clients bootstrap from any node to learn slot topology.
- [docs/en/functions/50-update.mdx](docs/en/functions/50-update.mdx): Guides vertical and horizontal scaling of Redis instances by adjusting `spec.resources` (CPU, memory) and replica counts via `kubectl patch` or the Web Console. Enforces architectural constraints such as identical storage across all Sentinel nodes, per-shard storage parity in Cluster mode, scheduling-constraint compatibility when changing replica counts, and the ~30% memory overhead Redis needs above the dataset size.
- [docs/en/functions/60-scheduling.mdx](docs/en/functions/60-scheduling.mdx): Explains hot-updating Pod scheduling configuration (node labels, tolerations, affinity) on running Redis instances to react to changing node taints without disrupting service. Notes that Pod migration is not performed during the update, so the post-update eligible-node set must continue to include the originally selected nodes and provide at least as many matching nodes as replicas.
- [docs/en/functions/70-backup-restore.mdx](docs/en/functions/70-backup-restore.mdx): Describes the two backup backends offered by Alauda Cache Service for Redis OSS: S3-compatible object storage and PVC-based storage requiring `ReadWriteMany` (NFS, Ceph). Walks through the Web Console flows for taking immediate snapshots, scheduling automatic backups with retention policies, and restoring a backup into a new Redis instance until it reaches the `Running` state.
- [docs/en/functions/75-monitor.mdx](docs/en/functions/75-monitor.mdx): Outlines built-in Redis monitoring dashboards covering cluster status (keyspace counts, command stats, replication lag), resource usage (memory, network, storage), and performance (connections, I/O, latency). Lists the platform's preconfigured alert indicators with recommended thresholds, such as cache hit rate <80%, response time >0.1s, CPU/memory/storage >80%, and shows a Prometheus PromQL example using `redis_keyspace_misses_total` and `redis_keyspace_hits_total` for custom alerts.
- [docs/en/functions/80-log.mdx](docs/en/functions/80-log.mdx): Walks through the Web Console **Realtime Log** tab for viewing container-level Redis logs, including filtering by Pod, full-text search, time-range selection, and exporting logs for offline analysis to support fault diagnosis.
- [docs/en/functions/85-start-stop.mdx](docs/en/functions/85-start-stop.mdx): Documents the one-click stop and start lifecycle actions in the Web Console for temporarily releasing instance resources without deletion. Lists the prerequisite states (`Processing`, `Running`, `Abnormal`, `Unknown` for stopping; `Stopping`, `Stopped` for starting) and enumerates which capabilities are disabled while stopped, including Terminal, Topology, Parameter Configuration, User Management, Backup & Restore, Inspection, and Access Method.
- [docs/en/functions/90-restart.mdx](docs/en/functions/90-restart.mdx): Procedure for restarting a Redis instance through the Web Console **Actions** menu to recover from connection floods, hangs, or unknown faults. Notes that Pods are rolled one by one (causing brief service blips) and that the instance must be in the `Running` state, while `Processing`/`Error` instances will auto-restart after the health-check timeout.
- [docs/en/functions/95-disaster-recovery/10-intro.mdx](docs/en/functions/95-disaster-recovery/10-intro.mdx): Introduces the alpha disaster recovery (DR) feature built on a Redis Module that intercepts writes, parses commands into an idempotent on-disk Oplog, and supports both full (RDB snapshot + Oplog start offset) and incremental synchronization with a disk-bounded window that survives multi-hour or multi-day network partitions. Explains the unified Proxy layer that abstracts Sentinel and Cluster routing, the `service_id` mechanism (range `[0-15]`) limiting one source to at most 15 targets in a star topology, and the Oplog slicing and delay-metric observability that feed Prometheus/Grafana.
- [docs/en/functions/95-disaster-recovery/20-setup.mdx](docs/en/functions/95-disaster-recovery/20-setup.mdx): Walks through enabling DR on source and target Redis instances by patching `spec.activeRedis.serviceID` on the Redis CR, optionally exposing the source Proxy via LoadBalancer (`activeredis-proxy-<instance-name>`), and wiring the target with an `ActiveRedisConnection` CR pointing at the source address and a Secret holding the source `default` user password. Describes Web Console inspection checks (network reachability, architecture match, cluster shard/slot consistency, target memory >= source, unique `service_id`) and the `Connected` / `PartialSync` / `FullSync` status fields surfaced on `ActiveRedisConnection`.
- [docs/en/functions/95-disaster-recovery/30-failover.mdx](docs/en/functions/95-disaster-recovery/30-failover.mdx): Describes how to detect a source outage via the `ActiveRedisConnection` resource transitioning to `Failed` (per-shard `Disconnected` messages), then perform manual failover by deleting the connection CR (or clicking **Failover** in the Web Console) so the target becomes an independent source before pointing clients at it. Compares client-side switching strategies (DNS, Proxy, Service Mesh, client library) with their RTO/RPO trade-offs, prescribes a multi-signal switching predicate combining instance health, HA loss, and data-center checks, and notes the platform does not ship a client-side switcher.
- [docs/en/functions/95-disaster-recovery/90-limitations.mdx](docs/en/functions/95-disaster-recovery/90-limitations.mdx): Enumerates DR constraints: only `<ip:port>` source addresses (no DNS); Proxy NodePort is not HA and should sit behind a load balancer or MetalLB VIP; only Redis 6.0 is supported with no in-place upgrade to 7.2; DR cannot be disabled once enabled (RDB AUX records and special replication instructions persist); local SSD storage is recommended due to 1s fsync, 3 GB Oplog slicing, and concurrent RDB activity; source and target must share the same architecture, shard count, and slot distribution; cluster online reshard is unsupported; max 15 targets per source with star-only topology; `migrate`, `pubsub`, and `stream` commands are not replicated; and the target must be disconnected from a failed source before switching to prevent dirty-data pollution.
- [docs/en/functions/95-disaster-recovery/index.mdx](docs/en/functions/95-disaster-recovery/index.mdx): Section landing page for the Disaster Recovery topic, rendering an automatic `<Overview />` index of the child pages (introduction, setup, failover, limitations).
- [docs/en/functions/index.mdx](docs/en/functions/index.mdx): Section landing page for the Redis operator's feature Guides, rendered via the <Overview /> component with weight 50 and zh title 功能指南.

## how_to

- [docs/en/how_to/20-cluster-scaling.mdx](docs/en/how_to/20-cluster-scaling.mdx): Procedure for horizontal shard scaling of Redis Cluster instances by editing spec.replicas.cluster.shard on the Redis custom resource (kubectl patch example for c6 to 4 shards) or via the Web Console's Shard Changes action. Covers prerequisites (Running state, memory headroom, maintenance window), memory-planning formulas for scale-up vs scale-down using Maximum Running Memory = Maximum Data Memory / 0.8, the Data Balancing state transition, and client-side MOVED redirection behavior during slot migration that can double request volume.
- [docs/en/how_to/30-upgrade.mdx](docs/en/how_to/30-upgrade.mdx): Walkthrough for Redis patch-version upgrades using a rolling pod restart strategy, including configuring the Patch Version Upgrade Policy (Manual vs Automatic during maintenance windows), upgrading a single instance from its details page via the Upgrade Now action after reviewing the changelog, and platform-admin bulk upgrades through the Upgrade Management panel that coordinate multiple Running instances across a chosen operator.
- [docs/en/how_to/40-cluster-slots-distribution.mdx](docs/en/how_to/40-cluster-slots-distribution.mdx): Explains how to override the default even distribution of all 16384 hash slots across Redis Cluster shards by supplying a custom layout at instance creation time. Includes a full kubectl-applied Redis CR example (apiVersion middleware.alauda.io/v1, arch cluster, 3 shards split as 0-5460/5461-10922/10923-16383) and documents spec.replicas.cluster.shards[].slots syntax accepting ranges, single slots, and mixed lists for disaster-recovery and other advanced layouts.
- [docs/en/how_to/access/10-sentinel.mdx](docs/en/how_to/access/10-sentinel.mdx): Client-connection cookbook for Redis Sentinel instances, including authentication options (password vs Set Password disabled), internal Kubernetes service endpoints and external Sentinel Node Access Addresses, redis-cli debugging via the Terminal Console, and ready-to-use code samples for go-redis (NewFailoverClient), Jedis (JedisSentinelPool), Lettuce (RedisURI.withSentinel), and Redisson (useSentinelServers) with tuned pool sizes, timeouts, keepalive, and retry settings. Notes the fixed master name mymaster.
- [docs/en/how_to/access/20-cluster.mdx](docs/en/how_to/access/20-cluster.mdx): Client-connection cookbook for Redis Cluster instances, covering password vs password-less auth, internal per-shard Kubernetes service endpoints, external Shard Addresses, and redis-cli -c usage for following MOVED/ASK redirections. Provides production-grade configuration examples for go-redis NewClusterClient, Jedis JedisCluster with topologyRefreshPeriod, Lettuce RedisClusterClient with ClusterTopologyRefreshOptions and adaptive refresh triggers, and Redisson useClusterServers with setCheckSlotsCoverage, including connection-pool sizing, keepalive, retries, and timeout tuning.
- [docs/en/how_to/access/index.mdx](docs/en/how_to/access/index.mdx): Section landing page titled Client Connection that introduces the Redis access guides via the <Overview /> component (weight 10).
- [docs/en/how_to/index.mdx](docs/en/how_to/index.mdx): Top-level HowTo section landing page for the Redis operator's practical guides, rendered through <Overview /> with weight 60 and zh title 实用指南.

## trouble_shooting

- [docs/en/trouble_shooting/20-bigkeycheck.mdx](docs/en/trouble_shooting/20-bigkeycheck.mdx): Guidance on detecting and remediating Redis BigKeys, with threshold guidelines (String >5MB; List >20,000; Set/Sorted Set/Hash >10,000 elements/fields) and impact analysis covering single-thread blocking, client timeouts, network saturation, cluster memory imbalance, and slot-migration bottlenecks. Documents enabling the Alauda Application Services Inspection by setting ENABLE_REDIS_KEYS_INDICATOR=1 on the rds-system RdsInstaller, viewing the BigKey Top5 report, and using redis-cli --bigkeys, plus mitigation via splitting collections, ProtoBuffer/MessagePack compression, and preferring UNLINK over the blocking DEL.
- [docs/en/trouble_shooting/index.mdx](docs/en/trouble_shooting/index.mdx): Section landing page for the Trouble Shooting (常见问题) area, rendered via the <Overview /> component with weight 70.
Loading