Are you trying to troubleshoot Azure Cache for Redis client-side issues?
This guide is for you.
Redis client issues are caused mainly by memory pressure, traffic burst, high client CPU usage, bandwidth limitation on the client-side, and larger request size or response size.
The Redis client side caching support is called Tracking. This costs memory in the server side, but sends invalidation messages only for the set of keys that the client could have in memory.
In this context, we shall look into steps to troubleshoot this Redis issue.
How to Troubleshooting Azure Cache for Redis client-side issues ?
To troubleshoot Azure Cache for Redis issues, apply the tips given below:
1. Memory pressure on Redis client
It is one of the most common causes of Azure Cache for Redis client-side issues.
Memory pressure on the client machine leads to performance issues which in turn delays the processing of responses from the cache.
Due to high memory pressure the system page data into the disk which may lead to slowing down the system.
Steps to troubleshoot:
i. Monitoring memory usage to ensure that it does not exceed available memory.
ii. Monitoring the client’s Page Faults/Sec performance counter.
iii. Spikes in page faults corresponding with request timeouts indicate memory pressure.
2. High memory pressure on the client can be troubleshot in the following ways:
i. Dig into the memory usage patterns to reduce memory consumption on the client.
ii. Upgrade the client VM to a larger size with larger memory.
3. Traffic burst
Processing of data that is already sent by the Redis Server but not yet consumed on the client-side gets delayed when there is a traffic burst. Bursts along with poor ThreadPool settings can also lead to this.
Steps to troubleshoot:
1. Monitoring the ThreadPool statistics. For this, we can use ThreadPoolLogger.
Also, we can use TimeoutException messages from StackExchange.Redis.
The output will look like the one given below:
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
From the above output we can check the following:
i. Whether the IOCP section and the WORKER section have a Busy value that is greater than the Min value.
ii. We can see how many bytes have been received at the client’s kernel socket layer but have not been read by the application.
We can configure the ThreadPool Settings to make sure that the thread pool scales up quickly under burst scenarios.
4. High client CPU usage
When there is High client CPU usage even if the cache sends the response quickly, the client may fail to process the response in a timely fashion.
We can monitor the client’s system-wide CPU usage using metrics available in the Azure portal or through performance counters on the machine.
To mitigate a client's high CPU usage:
i. We must Investigate what is causing CPU spikes.
ii. We can upgrade the client to a larger VM size with more CPU capacity.
5. Client-side bandwidth limitation
Bandwidth may vary for each client. If the client exceeds the available bandwidth by overloading network capacity, the data will not get processed on the client-side as quickly as the server is sending it. This situation results in timeouts.
We can monitor the Bandwidth usage with any tools like BandwidthLogger.
To mitigate Client-side bandwidth limitation:
i. We can reduce network bandwidth consumption
ii. Increase the client VM size to one with more network capacity.
6. Large request or response Size
A large request/response can also cause timeouts.
If a request ‘A’ and ‘B’ are sent quickly to the server. The server starts sending responses ‘A’ and ‘B’ quickly. Because of data transfer times, response ‘B’ must wait behind response ‘A’ times out even though the server responded quickly.
|-------- 1 Second Timeout (A)----------|
|-------- 1 Second Timeout (B) ----------|
|- Read Response A --------|
|- Read Response B-| (**TIMEOUT**)
This request/response is a difficult one to measure.
Resolutions for large response sizes:
i. We can optimize our application for a large number of small values, rather than a few large values.
ii. Increase the size of VM to get higher bandwidth capabilities
iii. We can increase the number of connection object our application uses.