The Low Latency Web

March 20, 2012

500,000 requests/sec – Modern HTTP servers are fast

Filed under: HTTP Servers — lowlatencyweb @ 7:00 am

A modern HTTP server running on somewhat recent hardware is capable of servicing a huge number of requests with very low latency. Here’s a plot showing requests per second vs. number of concurrent connections for the default index.html page included with nginx 1.0.14.


With this particular hardware & software combination the server quickly reaches over 500,000 requests/sec and sustains that with gradually increasing latency. Even at 1,000 concurrent connections, each requesting the page as quickly as possible, latency is only around 1.5ms.

The plot shows the average requests/sec and per-request latency of 3 runs of wrk -t 10 -c N -r 10m http://localhost:8080/index.html where N = number of connections. The load generator is wrk, a scalable HTTP benchmarking tool.

Software

The OS is Ubuntu 11.10 running Linux 3.0.0-16-generic #29-Ubuntu SMP Tue Feb 14 12:48:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux. The following kernel parameters were changed to increase the number of ephemeral ports, reduce TIME_WAIT, increase the allowed listen backlog, and the number of connections Netfilter can track:

  echo "2048 64512" > /proc/sys/net/ipv4/ip_local_port_range
  echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle
  echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
  echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout

  echo "65536" > /proc/sys/net/core/somaxconn
  echo "65536" > /proc/sys/net/ipv4/tcp_max_syn_backlog  

  echo "262144" > /proc/sys/net/netfilter/nf_conntrack_max

The HTTP server is nginx 1.0.14 built with ./configure && make, and run in-place with objs/nginx -p . -c nginx.conf.

nginx.conf

 
worker_processes     16;
worker_rlimit_nofile 262144;

daemon off;

events {
  use epoll;
  worker_connections 16384;
}

error_log error.log;
pid /dev/null;

http {
  sendfile   on;
  tcp_nopush on;

  keepalive_requests 100;

  open_file_cache max=100;

  gzip            off;
  gzip_min_length 1024;

  access_log off;

  server {
    listen *:8080 backlog=16384;

    location / {
      root   html;
      index  index.html;
    }
  }
}

Hardware

A dual Intel Xeon X5670 with 24GB of RAM from SoftLayer. The X5670 has 6 cores @ 2.93 GHz, 2 threads per core, /proc/cpuinfo shows 24 CPUs.

22 Comments »

  1. How did you capture and plot the results?

    Comment by Ricardo Tomasi (@ricardobeat) — March 20, 2012 @ 11:10 pm

  2. Try the Haskell WAI http server in one of your next articles, it has similar performance characteristics.

    Comment by Byron — March 20, 2012 @ 11:20 pm

  3. Ricardo, that plot was done with R and ggplot2 from a CSV file.

    Comment by lowlatencyweb — March 20, 2012 @ 11:48 pm

  4. Can you compare this with the new Apache HTTPD?

    Comment by skanga — March 21, 2012 @ 12:33 am

  5. It’s seldom static HTML pages that affect site load time.

    Comment by Philip Tellis (@bluesmoon) — March 21, 2012 @ 10:38 am

  6. How close the flooding the network is a setup like this?

    Comment by Anthony — March 21, 2012 @ 10:44 am

  7. Philip, perhaps, but most HTML includes a significant amount of static data: JS, CSS, images, etc. In any case this isn’t a benchmark of page load time but of theoretical maximum throughput for a particular hardware & software configuration.

    Anthony, I don’t have bandwidth numbers to share but it’s definitely less than gigabit speeds.

    Comment by lowlatencyweb — March 21, 2012 @ 11:42 am

  8. dont use tcp_tw_recycle , it broke connections from clients behind NAT

    Comment by Pavel Stano — March 21, 2012 @ 11:57 am

  9. Reblogged this on Carpet Bomberz Inc. and commented:
    Nginx is some serious voodoo for serving up websites. I’m bowled over by this level of performance for a consumer level Intel box running Ubuntu. And from 200 connections to 1000 connections performance stays high without any big increases in latency. Amazing.

    Comment by carpetbomberz — March 21, 2012 @ 1:07 pm

  10. Is the result achieved by using HTTP Keepalive? What kind of throughput would it be if no keepalive is used?

    Comment by Andy — March 21, 2012 @ 5:00 pm

    • Andy, yes HTTP keep-alive was used and the nginx default of 100 requests per connection was explicitly configured. I don’t see much value in disabling persistent connections given that every browser uses them.

      Comment by lowlatencyweb — March 22, 2012 @ 5:49 am

  11. […] previous article showed nginx 1.0.14 performance on a dedicated server from SoftLayer. That server was chosen simply […]

    Pingback by Modern HTTP Servers Are Fast, EC2 Is Not « The Low Latency Web — March 21, 2012 @ 9:55 pm

  12. Hi there, I notice you’ve set worker_processes to 16. Is there any reason you chose this number? The nginx wiki implies that the number of processes should match the number of CPU cores* and I have been working to this principle. I’m confused as to whether your test box has 6, 12 or 24 cores but whichever it is it doesnt match up with 16. Just wondering if you’ve done any experimentation with the number of workers that led you to using 16 explicitly?

    * http://wiki.nginx.org/CoreModule#worker_processes

    Comment by Rathers — March 22, 2012 @ 10:11 am

    • Rathers, a good question! After a bunch of experimentation 16 workers seemed to give the highest throughput. The machine itself has 12 cores, 24 if you count hyperthreads, and the OS advertises 24 processors. Theoretically hyperthreads should give a bit more throughput if the CPU stalls and there are runnable tasks available.

      Comment by lowlatencyweb — March 22, 2012 @ 10:25 am

      • OK, cheers for the feedback. If 16 gave you the highest throughput, was it a statistically significant improvement? I imagine the difference was only slight? From the experimentation I’ve been doing it doesn’t seem to make much difference how many workers you have or indeed how many CPU cores, I can easily saturate the network bandwidth and only consume a few percent of the CPU.

        Comment by Rathers — March 22, 2012 @ 10:55 am

  13. I see in the hardware description: “The X5670 has 6 cores @ 2.93 GHz, 2 threads per core, /proc/cpuinfo shows 24 CPUs.”
    It seemed strange the OS showing 24 cpus for 2 threads x 6 cores. But, at the comments, you answered saying that the machine had 12 cores.
    I think it is the right number, isn’t it ?

    Comment by lauro — March 22, 2012 @ 6:29 pm

  14. […] applications contain copious amounts of static content in the form of JavaScript, CSS, images, etc. Serving that efficiently means a better user experience, and also more CPU cycles free for dynamic content. Useful […]

    Pingback by 150,000 Requests/Sec – Dynamic HTML & JSON Can Be Fast « The Low Latency Web — March 22, 2012 @ 10:45 pm

  15. […] series of articles has drawn out the cargo cultists who insist that HTTP benchmarks must be run over the […]

    Pingback by A Note On Benchmarking « The Low Latency Web — March 23, 2012 @ 2:42 pm

  16. […] HTTP servers are capable of handling 500k requests/sec on commodity hardware. However that article ignored HTTP pipelining which can have a significant […]

    Pingback by 500,000 Requests/Sec? Piffle! 1,000,000 Is Better « The Low Latency Web — March 26, 2012 @ 1:58 pm

  17. Reblogged this on Egill and commented:

    n3rd pr0n! love it!

    Comment by Egill Erlendsson — March 26, 2012 @ 10:22 pm


RSS feed for comments on this post. TrackBack URI

Leave a reply to lowlatencyweb Cancel reply

Blog at WordPress.com.