Scaling `Aonnis Valkey Panther`

Aonnis Valkey Panther allows users to scale their Valkey cluster by modifying the number of primaries (shards) and replication factor (replicas). This provides flexibility in adjusting the cluster size based on workload requirements.

When to consider scaling

There are many reasons why you would want to scale the number of Valkey nodes in your cluster. Most common reasons are:

Memory pressure

The nodes in your cluster are close to full capacity (or are at fully capacity and evictions are causing the backend to take more traffic than desired)

Horizontally scale the number of primaries to better serve requests
Vertically scale your current Valkey nodes by allocating more memory

CPU bottleneck

Throughput is low and impacting system performance

Horizontally scale the number of primaries to better serve requests
Vertically scale your current Valkey nodes by allocating more CPUs

Over-provisioning

You have allocated too many resources for your cluster

Scale down if it does not hurt the performance of your system
Scale down the number of primaries to save on costs
If you are running a Valkey cluster with a high replication factor (RF), consider reducing it
In multi-zone clusters, scaling down may reduce availability in the case of a zone outage

Changing Cluster Size

To adjust the cluster size, update the following values in your Helm chart:

numberOfPrimaries – Defines the number of shards (primary nodes).
replicationFactor – Defines the number of replica nodes per primary.

After modifying these values, deploy the updated Helm chart, and Aonnis Valkey Panther will automatically resize the cluster as per the new configuration.

Example: Updating Helm Chart

Modify your Helm values file (values.yaml):

numberOfPrimaries: 6
replicationFactor: 2

Resizing Valkey Nodes (Resource Allocation)

In addition to scaling the number of nodes, you can also adjust the resources allocated to each Valkey node. This is done by modifying the valkeyNodeResources definition in your Helm chart. Panther will then perform a rolling update of your Valkey nodes to apply the new resource configuration.

valkeyNodeResources:
  limits:
    cpu: 200m
    memory: 512Mi
  requests:
    cpu: 200m
    memory: 512Mi

Important Considerations

Scaling Operations Cannot Be Scheduled

Currently, there is no scheduling feature for resizing operations. If you require the ability to schedule scaling operations in advance, please request this feature.

warning

Perform Scaling During Low Traffic Periods.

Scaling operations are compute-intensive and may impact cluster performance. To minimize the risk of downtime or latency spikes, perform the scaling process during off-peak hours when traffic is expected to be lower.

Resource Requirements

Like the rolling update procedure, scaling up requires additional resources to create new Valkey pods. In the case where there are insufficient resources to schedule new Valkey pods, the pods will get stuck in Pending state. If you find your newly created pods are in Pending state, increase the memory + cpu allocated to your k8s nodes, or add more nodes to your worker pool.

Scaling primaries

The first option for scaling your cluster is scaling the number of primaries. You can trigger a scaling operation by modifying the numberOfPrimaries field in charts/node-for-valkey/values.yaml and running helm upgrade on your cluster.

Scaling up

Scale up operations take place when the desired number of primaries is greater than the current number of primaries. We take the following actions for scale up operations:

For the number of primaries added,
Create a new Valkey pod.
Wait for the pod to become ready.

Once the number of desired Valkey pods matches the current number of running pods,
Check if we have sufficient primaries. If not, promote replicas to primaries. This only happens when you scale up the number of primaries AND scale down RF.
Add the new Valkey nodes to the selection of current primaries.
Place and attach replicas to their respective primaries.
Dispatch slots and migrate keys to the new primaries.

After this last step, your cluster will be in normal operating state. The primaries nodes will have an equal number of slots and replica nodes will be properly attached.

Example

Given a Valkey cluster with the following config:

numberOfPrimaries: 3
replicationFactor: 1

Assuming your helm release name is valkey-cluster, scale up numberOfPrimaries by running the following:

helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=5

Scaling down

Scale down operations take place when the desired number of primaries is less than the current number of primaries. We take the following actions for scale down operations:

For the number of primaries deleted,
Select one primary to remove.
Migrate keys from the primary to be removed to the other primaries. Slots are equally distributed across the remaining primaries.
Detach, forget, and delete the primary to be removed.

Once the number of desired Valkey pods matches the current number of running pods,
Dispatch slots and migrate keys to the new primaries.
Place and attach replicas to their respective primaries.

After this last step, your cluster will be in normal operating state.

Example

Given a Valkey cluster with the following config:

numberOfPrimaries: 5
replicationFactor: 1

Scale down numberOfPrimaries by running the following:

helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=3

Scaling replication factor

The second option for scaling your cluster is scaling RF. You can trigger a scaling operation by modifying the replicationFactor field in charts/node-for-valkey/values.yaml and running helm upgrade on your cluster.

Scaling up

Scale up operations for RF take place when the desired RF is greater than the current RF. We take the following actions for scale up operations:

For the number of replicas added,
Create a new Valkey pod.
Wait for the pod to become ready.

Once the number of desired Valkey pods matches the current number of running pods,
Add the new Valkey nodes to the selection of replicas.
Place and attach replicas to their respective primaries such that each primary has the same number of replicas.
Dispatch slots and migrate keys to the new primaries.

After this step, your cluster will be in normal operating state.

Example

Given a Valkey cluster with the following config:

numberOfPrimaries: 3
replicationFactor: 1

Scale up replicationFactor by running the following:

helm upgrade valkey-cluster charts/node-for-valkey --set replicationFactor=2

Scaling down

Scale down operations for RF take place when the desired RF is less than the current RF. We take the following actions for scale down operations:

For the number of replicas deleted,
    For each primary in the cluster,
        1. Calculate the difference between the current RF and desired RF.
            2. If we do not have sufficient replicas for this primary, select new replicas and attach them to the primary.
            3. If we have too many replicas, select replicas to delete. Detach, forget, and delete the replica to be removed.
    
Once the number of desired Valkey pods matches the current number of running pods,
    4. Place and attach replicas to their respective primaries.

After this step, your cluster will be in normal operating state.

Example

Given a Valkey cluster with the following config:

numberOfPrimaries: 3
replicationFactor: 2

Scale down replicationFactor by running the following:

helm upgrade valkey-cluster charts/node-for-valkey --set replicationFactor=1

Scaling primaries and replication factor

You may scale both the number of primaries and replication factor in a single helm upgrade command. The number of pods created or deleted will be calculated and actions will be taken according to the algorithms described in the previous sections. The following is an example of scaling up numberOfPrimaries and replicationFactor.

Example 1

Given a Valkey cluster with the following config:

numberOfPrimaries: 3
replicationFactor: 1

Increase numberOfPrimaries and replicationFactor by running the following:

helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=4 --set replicationFactor=2

Example 2

You may also scale up one field while scaling down the other:

numberOfPrimaries: 4
replicationFactor: 2

Increase numberOfPrimaries and decrease replicationFactor by running the following:

helm upgrade valkey-cluster charts/node-for-valkey --set numberOfPrimaries=5 --set replicationFactor=1

Next Steps

Modify numberOfPrimaries and replicationFactor in your Helm chart.
Deploy the updated configuration using Helm.
Monitor cluster performance to ensure a smooth transition.
Request scheduling support if you need automated resizing.

By following these steps, you can dynamically scale your Valkey cluster to meet changing demands. 🚀

When to consider scaling​

Memory pressure​

CPU bottleneck​

Over-provisioning​

Changing Cluster Size​

Example: Updating Helm Chart​

Resizing Valkey Nodes (Resource Allocation)​

Important Considerations​

Scaling Operations Cannot Be Scheduled​

Resource Requirements​

Scaling primaries​

Scaling up​

Example​

Scaling down​

Example​

Scaling replication factor​

Scaling up​

Example​

Scaling down​

Example​

Scaling primaries and replication factor​

Example 1​

Example 2​

Next Steps​

When to consider scaling

Memory pressure

CPU bottleneck

Over-provisioning

Changing Cluster Size

Example: Updating Helm Chart

Resizing Valkey Nodes (Resource Allocation)

Important Considerations

Scaling Operations Cannot Be Scheduled

Resource Requirements

Scaling primaries

Scaling up

Example

Scaling down

Example

Scaling replication factor

Scaling up

Example

Scaling down

Example

Scaling primaries and replication factor

Example 1

Example 2

Next Steps