The C10k problem is so last century. How about serving 1 million load balanced requests per second in the cloud? Using publicly available Google Cloud Platform services, including Google Compute Load Balancing, we have done exactly that. Within 5 seconds after the setup and without any pre-warming, our load balancer was able to serve 1 million requests per second and sustain that level.

This number is 20 times greater than the throughput in last year’s Eurovision Song Contest, which served 125 million users in Europe and was hosted on Google Compute Engine. The Eurovision setup used DNS load balancing, which increases the complexity of setup, maintenance and cost. We were able to simplify this process using Compute Engine Load Balancing, which avoids these issues by providing simple APIs and allowing a single IP address to serve all traffic at a lower cost than do-it-yourself options. In addition, Compute Engine Load Balancing has the ability to detect unhealthy instances and dynamically add and remove new instances to serve traffic. This addresses the hard-to-solve problem of cached DNS entries in end-user browsers. No more “404’s”.

Starting with an empty Compute Engine project and ending with 456 cores provisioned (for the load generator and web servers VMs) and one load-balanced IP address actively processing 1.016M Requests Per Second (+-0.007M) took a total of 7 minutes 30 seconds. The 1M number is measuring a complete request and successful response. You can read more on how to set up the Compute Engine Load Balancing and how to provision Compute Engine VMs on the Google Cloud Platform website.

The following depicts the setup used:

This setup demonstrated a couple of features, including scaling of the Compute Engine Load Balancing, use of different machine types and rapid provisioning. For generating the load we used 64 n1-standard-4’s running curl_loader with 16 threads and 1000 connections. Each curl_loader ran the same config to generate roughly the same number of requests to the LB. The load was directed at a single IP address, which then fanned out to the web servers.

To demonstrate scaling of the Compute Engine Load Balancing fanout we used 200 n1-standard-1’s Web Server running Apache v2.2.22 on Debian 7.1 Wheezy Images. Users are encouraged to use larger VM types for better single machine backend web serving, however here we demonstrated the scaling of the load balancer to backends and were not concerned with the backends themselves using every cycle to serve responses. Each backend web server received ~5K requests per second, which is an even distribution.

Compute Engine Load Balancing distributed the load by using a tuple of source address+port, destination address+port and protocol. Each web response was 1 byte in size not including the http headers. All of this was configured from an empty Compute Engine project. To reproduce the data yourself you can use the following Gist.

This entire setup and test cost just $10 USD!

-Posted by, Anthony F. Voellm, Performance Engineering Manager