The overloaded server is an insidious problem for many websites: regardless of the type of hosting, there are some situations that are repeated frequently and that can be resolved from the beginning, with a 4-steps work to identify and fix bottlenecks that slow down the system, improve overall server performance and avoid regressions.

The four steps to avoid server overload

To guide us in these operations is a post published on signed by Katie Hempenius, software engineer at Google, which immediately reports what are the four steps in which the work is declined:

  1. Evaluate, that is, determine the bottleneck that is impacting the server.
  2. Stabilize, which means implementing quick solutions to mitigate the impact.
  3. Improve: to increase and optimize server capacity.
  4. Monitor: use automated tools to help prevent future problems.

The step of problem analysis

In engineering (and then also in computer science) a bottleneck occurs when a single component heavily constrains and influences the performance of a system or its capabilities.

Which are the bottlenecks of a server

For a site, in case of traffic that overloads the server, the CPU, network, memory or I/O disk can become bottlenecks; identifying which of these is the bottleneck allows you to focus efforts on interventions to mitigate the damage and solve it. The bottlenecks of the CPU and network are the most relevant during a traffic peak for most sites, so we focus mainly on them.

  • CPU: CPU usage constantly above 80% must be studied and corrected. Server performances often worsens when CPU usage reaches a threshold of 80-90% and this becomes even more remarkable by approaching 100%.

The use of the CPU to satisfy a single request is negligible, but doing so in the scale found during traffic peaks can sometimes overwhelm a server. Downloading services on other infrastructures, reducing expensive operations and limiting the amount of requests can reduce CPU usage.

  • Network: during periods of heavy traffic, the transmission capacity of the network that serves to meet the demands of users may exceed the limit. Some sites, depending on the hosting provider, may also exceed the limits related to cumulative data transfer. To remove this bottleneck you need to reduce the size and amount of data transferred to and from the server.
  • Memory: when a system does not have enough memory, the data must be poured back to disk for storage. Disk access is much less rapid than memory access and this can slow down an entire application. If the memory becomes completely exhausted, it can cause Out of Memory (OOM) errors. Adjusting memory allocation, correcting memory loss, and updating memory can remove this bottleneck.
  • Disk I/O: the speed at which data can be read or written from the disk is limited by the disk itself. If the I/O of the disk is a bottleneck, increasing the amount of data stored in the memory can mitigate this problem (at the expense of increased memory usage); if it does not work, you may need to upgrade the disks.


How to detect bottlenecks

Running the top Linux command on the affected server is a good starting point for the analysis of bottlenecks; if available, we can integrate it with historical data from the hosting provider or with other monitoring tools.

The stabilization step

An overloaded server can quickly lead to cascading failures in other parts of the system, so it is important to stabilize the server before attempting to make more significant changes.

The rate limiting protects the infrastructure by limiting the number of incoming requests, an increasingly important intervention when the server performances decrease: as response times lengthen, users tend to update the page aggressively, further increasing the server load.

Rejecting a request is relatively inexpensive, but the best way to protect the server is to manage the rate limiting upstream, for example through a load balancing, an inverse proxy or a CDN.

HTTP Caching

According to Hempenius, you should look for ways to more aggressively store the contents in the cache: if a resource can be provided by an HTTP cache (whether it is the browser cache or a CDN), you do not need to request it from the source server, which reduces the server load.

Http headers such as Cache-Control, Expires and Tag indicate how a resource must be stored by an HTTP cache: checking and correcting these headers will improve caching.

Even service workers can be used for caching, but use a separate cache and represent an integration, instead of a replacement, for proper HTTP caching, and then, in case of server overload, efforts should be focused on optimising the memorization in the HTTP cache storage.

How to diagnose and solve problems

To address this, we run Google Lighthouse and focus on the audit Uses inefficient cache policy on static assets to view a short or medium life time (TTL) resource list, considering whether to increase the TTL of each resource. As rough advice, the Googler explains that:

  • Static resources must be cached with a long TTL (1 year).
  • Dynamic resources must be cached with a short TTL (3 hours).

The correction can be implemented by setting the max-age directive in the Cache-Control header to the appropriate number of seconds, which is just one of many directives and headers that influence the caching behavior of the application.

The strategy of the Graceful Degradation

The Graceful Degradation is a strategy based on the temporary reduction of functionality to eliminate excess load from a system. This concept can be applied in many different ways: for example, serving a static text page instead of a complete application, disabling search or returning fewer search results or disabling some wasteful or non-essential features. The important thing is to focus on features that can be removed easily and securely with minimal impact on the business.

The step of improvements

There are many suggestions to implement and optimize server capabilities; in particular, Katie Hempenius identifies at least five areas to focus attention to.

1.      Using a content distribution network (CDN)

The static resource service can be downloaded from the server to a content distribution network (CDN), thus reducing the load. The main function of a CDN network is to quickly deliver content to users through an extensive network of servers located in their vicinity, but most CDNs also offer additional performance features such as compression, load balancing and support optimization.

2.      Resizing calculation resources

The decision to resize calculation resources should be taken with care: although it is often necessary, doing so prematurely can generate “unnecessary architectural complexity and financial costs”.

A high Time To First Byte (TTFB) may indicate that a server is approaching its maximum capacity, and a monitoring tool allows you to more accurately assess CPU usage: if the current or expected level exceeds 80%, it is advisable to increase the servers.

Adding a load balancer allows you to distribute traffic across multiple servers, routing the traffic to the most appropriate one; cloud service providers offer their own load balancing systems, or you can configure your own using Haproxy or NGINX, then adding the other servers.

Most cloud providers offer automatic resizing, which works in parallel to load balancing, automatically changing computing resources at the top and bottom based on demand at a given time.

However, Hempenius points out that it is not a magic tool: it takes time for new instances to be online and requires a meaningful configuration. Due to its additional complexity, it is necessary to first consider a configuration based on the simpler load balancing.

3.      Enable compression

Text-based resources must be compressed using gzip or brotli, which can reduce the transfer size by about 70%.

Compression can be enabled by updating the server configuration.

4.      Optimize images and multimedial contents

For many sites, images represent the greatest load in terms of file size and image optimization can quickly and significantly reduce the size of a site, as we were saying in a past insight.

Lighthouse has a variety of audits that report potential optimizations on these resources, or alternatively you can use Devtools to identify larger files such as hero images (which probably need downsizing interventions).

In principle, Hempenius suggests a quick checklist:

  • Size: images must not be larger than necessary.
  • Compression: in general, a quality level of 80-85 will have a minimal effect on the image quality but reduces the file size by 30-40%.
  • Format: use JPEG for photos instead of PNG; use MP4 for animated content instead of GIF.

More generally, you might consider setting up image CDNs, designed to serve and optimize images and download the service from the source server. Setting up such CDNs is simple, but requires the updating of existing image Urls to point to the new address.

5.      Minimize JS and CSS

The minify removes unnecessary characters from Javascript and CSS. A quick intervention consists in minimizing only Javascript (usually more consistent on sites than CSS) to have an immediate greater impact.

The monitoring step

Server monitoring tools provide data collection, dashboards, and server performance alerts, and their use can help prevent and mitigate future performance issues.

There are some metrics that help to systematically and accurately detect problems; for instance, server response time (latency) works especially well for this, as it detects a wide variety of problems and correlates directly with the user experience.

Call to action