Scaling Aonnis Valkey Panther
Aonnis Valkey Panther allows users to scale their Valkey cluster by modifying the number of primaries (shards) and replication factor (replicas). This provides flexibility in adjusting the cluster size based on workload requirements.
When to consider scaling
There are many reasons why you would want to scale the number of Valkey nodes in your cluster. Most common reasons are:
Memory pressure
The nodes in your cluster are close to full capacity (or are at fully capacity and evictions are causing the backend to take more traffic than desired)
- Horizontally scale the number of primaries to better serve requests
- Vertically scale your current Valkey nodes by allocating more memory
CPU bottleneck
Throughput is low and impacting system performance
- Horizontally scale the number of primaries to better serve requests
- Vertically scale your current Valkey nodes by allocating more CPUs
Over-provisioning
You have allocated too many resources for your cluster
- Scale down if it does not hurt the performance of your system
- Scale down the number of primaries to save on costs
- If you are running a Valkey cluster with a high replication factor (RF), consider reducing it
- In multi-zone clusters, scaling down may reduce availability in the case of a zone outage
Changing Cluster Size
To adjust the cluster size, update the following values in your Helm chart:
numberOfPrimaries– Defines the number of shards (primary nodes).replicationFactor– Defines the number of replica nodes per primary.
After modifying these values, deploy the updated Helm chart, and Aonnis Valkey Panther will automatically resize the cluster as per the new configuration.
Example: Updating Helm Chart
Modify your Helm values file (values.yaml):
numberOfPrimaries: 6
replicationFactor: 2
Resizing Valkey Nodes (Resource Allocation)
In addition to scaling the number of nodes, you can also adjust the resources allocated to each Valkey node. This is done by modifying the valkeyNodeResources definition in your Helm chart. Panther will then perform a rolling update of your Valkey nodes to apply the new resource configuration.
valkeyNodeResources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 200m
memory: 512Mi
Important Considerations
Scaling Operations Cannot Be Scheduled
Currently, there is no scheduling feature for resizing operations. If you require the ability to schedule scaling operations in advance, please request this feature.
Perform Scaling During Low Traffic Periods.
Scaling operations are compute-intensive and may impact cluster performance. To minimize the risk of downtime or latency spikes, perform the scaling process during off-peak hours when traffic is expected to be lower.
Resource Requirements
Like the rolling update procedure, scaling up requires additional resources to create new Valkey pods. In the case where there are insufficient resources to schedule new Valkey pods, the pods will get stuck in Pending state. If you find your newly created pods are in Pending state, increase the memory + cpu allocated to your k8s nodes, or add more nodes to your worker pool.
Scaling primaries
The first option for scaling your cluster is scaling the number of primaries. You can trigger a scaling operation by modifying the numberOfPrimaries field in charts/node-for-valkey/values.yaml and running helm upgrade on your cluster.
Scaling up
Scale up operations take place when the desired number of primaries is greater than the current number of primaries. We take the following actions for scale up operations:
For the number of primaries added,
1. Create a new Valkey pod.
2. Wait for the pod to become ready.
Once the number of desired Valkey pods matches the current number of running pods,
3. Check if we have sufficient primaries. If not, promote replicas to primaries. This only happens when you scale up the number of primaries AND scale down RF.
4. Add the new Valkey nodes to the selection of current primaries.
5. Place and attach replicas to their respective primaries.
6. Dispatch slots and migrate keys to the new primaries.
After this last step, your cluster will be in normal operating state. The primaries nodes will have an equal number of slots and replica nodes will be properly attached.
Example
Given a Valkey cluster with the following config:
numberOfPrimaries: 3
replicationFactor: 1
Assuming your helm release name is valkey-cluster, scale up numberOfPrimaries by running the following:
helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=5
Scaling down
Scale down operations take place when the desired number of primaries is less than the current number of primaries. We take the following actions for scale down operations:
For the number of primaries deleted,
1. Select one primary to remove.
2. Migrate keys from the primary to be removed to the other primaries. Slots are equally distributed across the remaining primaries.
3. Detach, forget, and delete the primary to be removed.
Once the number of desired Valkey pods matches the current number of running pods,
4. Dispatch slots and migrate keys to the new primaries.
5. Place and attach replicas to their respective primaries.
After this last step, your cluster will be in normal operating state.
Example
Given a Valkey cluster with the following config:
numberOfPrimaries: 5
replicationFactor: 1
Scale down numberOfPrimaries by running the following:
helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=3
Scaling replication factor
The second option for scaling your cluster is scaling RF. You can trigger a scaling operation by modifying the replicationFactor field in charts/node-for-valkey/values.yaml and running helm upgrade on your cluster.
Scaling up
Scale up operations for RF take place when the desired RF is greater than the current RF. We take the following actions for scale up operations:
For the number of replicas added,
1. Create a new Valkey pod.
2. Wait for the pod to become ready.
Once the number of desired Valkey pods matches the current number of running pods,
3. Add the new Valkey nodes to the selection of replicas.
4. Place and attach replicas to their respective primaries such that each primary has the same number of replicas.
5. Dispatch slots and migrate keys to the new primaries.
After this step, your cluster will be in normal operating state.
Example
Given a Valkey cluster with the following config:
numberOfPrimaries: 3
replicationFactor: 1
Scale up replicationFactor by running the following:
helm upgrade valkey-cluster charts/node-for-valkey --set replicationFactor=2
Scaling down
Scale down operations for RF take place when the desired RF is less than the current RF. We take the following actions for scale down operations:
For the number of replicas deleted,
For each primary in the cluster,
1. Calculate the difference between the current RF and desired RF.
2. If we do not have sufficient replicas for this primary, select new replicas and attach them to the primary.
3. If we have too many replicas, select replicas to delete. Detach, forget, and delete the replica to be removed.
Once the number of desired Valkey pods matches the current number of running pods,
4. Place and attach replicas to their respective primaries.
After this step, your cluster will be in normal operating state.
Example
Given a Valkey cluster with the following config:
numberOfPrimaries: 3
replicationFactor: 2
Scale down replicationFactor by running the following:
helm upgrade valkey-cluster charts/node-for-valkey --set replicationFactor=1
Scaling primaries and replication factor
You may scale both the number of primaries and replication factor in a single helm upgrade command. The number of pods created or deleted will be calculated and actions will be taken according to the algorithms described in the previous sections. The following is an example of scaling up numberOfPrimaries and replicationFactor.
Example 1
Given a Valkey cluster with the following config:
numberOfPrimaries: 3
replicationFactor: 1
Increase numberOfPrimaries and replicationFactor by running the following:
helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=4 --set replicationFactor=2
Example 2
You may also scale up one field while scaling down the other:
numberOfPrimaries: 4
replicationFactor: 2
Increase numberOfPrimaries and decrease replicationFactor by running the following:
helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=5 --set replicationFactor=1
Next Steps
- Modify numberOfPrimaries and replicationFactor in your Helm chart.
- Deploy the updated configuration using Helm.
- Monitor cluster performance to ensure a smooth transition.
- Request scheduling support if you need automated resizing.
By following these steps, you can dynamically scale your Valkey cluster to meet changing demands. 🚀