Data Loss through Partial loss of keys in Azure Cache for Redis

Server Management Service

Azure Cache for Redis is a fully managed, in-memory cache that enables high-performance and scalable architectures.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to fix Redis related errors.

In this context, we will look into the main causes of data loss in Azure Cache for Redis.

Types of data loss in Azure Cache for Redis?

Azure Cache for Redis can be used as a distributed data or content cache, a session store, a message broker, and more.

Some forms of data loss in Azure Cache for Redis include:

i. Partial loss of keys

ii. Major or complete loss of keys

Data Loss through Partial loss of keys in Azure Cache for Redis

Azure Cache for Redis removes keys in response to expiration or eviction policies and to explicit key-deletion commands.

Keys that have been written to the primary node in a Premium or Standard Azure Cache for Redis instance also might not be available on a replica right away.

Data replication from the primary to the replica is in an asynchronous and non-blocking manner.

Thus, if you find that keys have disappeared from your cache, check the following possible causes:

i. Key expiration: Keys removed because of time-outs set on them.

ii. Key eviction: Keys removed under memory pressure.

iii. Async replication: Keys not available on a replica because of data-replication delays.

iv. Key deletion: Keys removed by explicit delete commands.

Let us now look at each of these scenarios in detail.

Key expiration

Azure Cache for Redis removes a key automatically if it completes its time-out value. The SET, SETEX, GETSET, and other STORE commands help us to use the time-out value.

To get stats on the expired keys, use the INFO command. The Stats section shows the total number of expired keys.

The Keyspace section provides more information about the number of keys with time-outs and the average time-out value.

# Stats

expired_keys:46583

# Keyspace

db0:keys=3450,expires=2,avg_ttl=91861015336

We can also look at diagnostic metrics for the cache, to see if there is a correlation between when the key went missing and a spike in expired keys.

Key eviction

When Azure Cache for Redis requires memory space to store data, it purges keys to free up available memory.

This happens when the used_memory or used_memory_rss values in the INFO command approach the configured max memory setting.

Azure Cache for Redis here, starts evicting keys from memory based on cache policy.

We can monitor the number of evicted keys by using the INFO command:

# Stats

evicted_keys:13224

We can also look at diagnostic metrics for the cache, to see if there is a correlation between missing key and a spike in evicted keys.

Key deletion

Redis clients can issue the DEL or HDEL command to explicitly remove keys from Azure Cache for Redis.

We can track the number of delete operations by using the INFO command. The Commandstats section shows these details on the DEL or HDEL commands:

# Commandstats

cmdstat_del:calls=2,usec=90,usec_per_call=45.00
cmdstat_hdel:calls=1,usec=47,usec_per_call=47.00

Async replication

Any Azure Cache for Redis instance in the Standard or Premium tier contains a primary node and at least one replica. Background processes asynchronously copy data from the primary to a replica.

As this replication is not always instantaneous, partial data loss can occur for scenarios where clients write to Redis frequently.

For example, let us assume that the primary goes down after a client writes a key to it. If the background process did not get a chance to send that key to the replica, it loses the key when the replica takes over as the new primary.

Data Loss through Major or complete loss of keys in Azure Cache for Redis

At times, most or all keys disappears from the cache due to reasons like:

i. Key flushing: Keys purged manually.

ii. Incorrect database selection: Azure Cache for Redis set to use a non-default database.

iii. Redis instance failure: The Redis server is unavailable.

Let us now look at each of these scenarios in detail.

Key flushing

Clients can call the FLUSHDB command to remove all keys in a single database or FLUSHALL to remove all keys from all databases in a Redis cache.

To find out whether keys have been flushed, use the INFO command. The Commandstats section shows whether either FLUSH command has been called:

# Commandstats

cmdstat_flushall:calls=2,usec=112,usec_per_call=56.00
cmdstat_flushdb:calls=1,usec=110,usec_per_call=52.00

Incorrect database selection

Azure Cache for Redis uses the db0 database by default. If we switch to another database (for example, db1) and try to read keys from it, Azure Cache for Redis will not find them there.

Every database is a logically separate unit and holds a different dataset. Use the SELECT command to use other available databases and look for keys in each of them.

Redis instance failure

Redis keeps data on the physical or virtual machines that host the Redis cache. An Azure Cache for Redis instance in the Basic tier runs on only a single virtual machine (VM). If that VM is down, we will lose the entire data stored in the cache.

Caches in the Standard and Premium tiers offer much higher resiliency against data loss by using two VMs in a replicated configuration. When the primary node in such a cache fails, the replica node takes over to serve data automatically.