A Fair Proxy Balancer for Nginx and Mongrel

Posted by ezmobius Fri, 09 Nov 2007 22:50:00 GMT

Nginx is my favorite webserver as you astute reader all probably know. It’s very fast, light and rock solid in production. My only complaint was that the proxy modue was round robin only. This means that if you had 3 mongrels and one of them performed and action that took say 20 seconds, for that 20 seconds every 3rd request would get stuck in line behind the mongrel doing the long blocking request. This is because nginx was merrily doing round robin and didn’t care if a mongrel was busy or not.

Now we all know that you should not have actions that take this long in our apps right? Of course we do, but that doesn’t stop it from happening quite a bit in a lot of apps I’ve seen.

So EngineYard.com put out a bounty on getting a fair proxy balancer that would keep track of when a mongrel is busy and not send requests to it until it has finished and is ready to take requests. This means that now we can effectively queue inside nginx very efficiently and only send requests to non busy mongrels. Thereby avoiding the waiting in the wrong line for service problem described above.

Grzegorz Nosek has stepped up to claim the bounty and has done a wonderful job of making this work very nicely. this is not a production release but I wanted to get this out there to get some testing from interested parties.

You can grab a tarball of the git repo here:

http://git.localdomain.pl/?p=nginx.git;a=tree;hb=upstream_fair-0.6

click the snapshot link to download a tarball. Then you can compile nginx with the new module like this:

$ cd nginx.git $ ./configure—with-http_ssl_module $ make $ sudo make install

The only difference in configuration will be to add the “fair;” directive to your upstream block. For example:

  upstream mongrel {
    fair;
    server 127.0.0.1:5000;
    server 127.0.0.1:5001;
    server 127.0.0.1:5002;
  }

Here is an easy way to try this out and verify it’s working for you. Create a rails or merb app with one action that does a “sleep 15” and another action that returns some text immediately. Fire up a cluster of 3 mongrels behind your newly compiled nginx with the fair directive commented out.

Now open up 2 browser windows and hit the sleepy action in one window and repeatedly hit the other action in the second window. with the fair directive commented out every 3rd request for the fast action will hang until the sleeping action is done. This is because mongrel is queuing requests and nginx is just using round robin.

‘Now uncomment the “fair;” line and restart nginx and do the experiment again. This time you will notice that nginx avoids sending any requests to the sleeping mongrel and only balances between the free ones until the sleep finishes, then it is added back to the available pool.

Please give this a shot and beat it up some folks. Report any problems you have here in the comments. This is going to be huge for any rails apps running under nginx and I;’d like to ferret out any bugs before we have an official release.

Tags , ,  | 23 comments

Comments

  1. gnufied said about 1 hour later:
    Going to play with it, right away. Will post updates. Cool bit.
  2. gnufied said about 1 hour later:
    extension reported by link is wrong. its just a tar, but download link gives .tar.gz and hence confusing the extractors.
  3. Ben Schwarz said about 3 hours later:
    Great stuff Ezra, I'll be installing this against some test systems today / next week.
  4. Joe Van Dyk said about 3 hours later:
    ooh, nice!!! I was planning on using nginx -> haproxy -> mongrels.
  5. Koz said about 5 hours later:
    Bah, just spent a few hours setting up nginx -> haproxy. It doesn't have the cool status page though... Any plans to roll out to EY customers?
  6. Ezra said about 5 hours later:
    @koz- yeah we will be rolling it out to all customers once it's been tested some more and beat on a bit. It will be open source as well and we'll try to get it in nginx proper once it's proven.
  7. Walter McGinnis said about 6 hours later:
    Is it just me or are the links in your RSS hosed? I get this: urn:uuid:11bd2d40-d279-46a9-885e-2a42d571ea02 for this article. Looks like feedburner id or something. Back to the topic at hand... This sounds pretty cool, just wondering if it's the sort of feature that might be rolled into swiftiply's upcoming cluster management. I'm still looking for a way to manage spinning up mongrel instances when needed and spin down ones that are unneeded. Pointers appreciated. Cheers, Walter
  8. bryanl said about 13 hours later:
    awesome! can't wait to play with it.
  9. Justin said about 21 hours later:
    Wow, exciting! We've definitely experienced this round robin issue so we can't wait to see this in production!
  10. Casey said 1 day later:
    I popped this into my production site this morning (it's an invite only beta, so why not? :) Nearly 1,000,000 Rails requests since then and no problems yet. This is great - sometimes we do not-so-speedy 3rd party REST API calls during a request and this might save me from having to go nuts with backgroundrb.
  11. gnufied said 1 day later:
    So far, so good. Humming along nicely.
  12. Garry said 1 day later:
    Niiiiice. I could *definitely* try this out on a staging server at work, b/c we have some dumb requests that sometimes take a good while to complete. Thanks!
  13. Casey said 2 days later:
    So I do have one odd thing that happens... My hot deploy stops half of the cluster (by updating an included nginx proxy config and reloading nginx), updates/restarts mongrels, starts the cluster, and does the same for the other half. Something goes weird when I use fair; - after everything is done I get a blank page and I have to restart (not just reload) nginx. I haven't had a chance to look into it further...
  14. Grzegorz Nosek said 3 days later:
    @Casey: I'm the author of this module. A blank page may be the result of an nginx worker segfaulting, so I'd appreciate if you did the following:
    • enable core dumps (and make sure nginx can write to the core directory), e.g. under Linux:
      ulimit -c unlimited # in a startup script
      mkdir -p /var/lib/core
      chmod 1733 /var/lib/core
      echo '/var/lib/core/core.%e.%p.%t' > /proc/sys/kernel/core_pattern
      
    • restart nginx
    • if any core dumps appear, mail them to me (root at localdomain.pl) with your nginx binary (this is very important); alternatively, mail me the result of bt full (gdb /path/to/nginx /path/to/core)
  15. Stephen Touset said 6 days later:
    Grzegorz, is there any way to apply this fair balancing module to the stable 0.5.32 release?
  16. Casey said 6 days later:
    Hi Grzegorz - I'll do that.
  17. Grzegorz Nosek said 6 days later:
    @Stephen:

    upstream_fair works out of the box on nginx 0.5 (in fact, I'm using 0.5 myself).

    To use upstream_fair with nginx 0.5, download the latest snapshot from my repo. This version is however further patched for other features, so if you want pure nginx + upstream_fair, take the following files from my repo (the rest may come from vanilla nginx):

    • auto/modules
    • auto/options
    • auto/sources
    • src/http/modules/ngx_http_upstream_fair_module.c

    (the first three files just wire the module into the nginx build process). Then just ./configure && make build install upgrade (or your favourite nginx build method).

    @Casey:

    Please do, but please also try the latest snapshot. I managed to reproduce and fix a very similar bug so it may work just fine now.

  18. Alexander Staubo said 19 days later:
    We have been running this patch on a live Rails site for a couple of weeks. We switched from Lighttpd + FastCGI to Nginx + Mongrel for a couple of technical reasons I won't go into here. Generally performance has been worse, but I have been unable to pin down what's wrong. From what I can see, the fair patch is not working consistently. A large portion of the requests will go to a mongrel which is already processing a request. Here is an output from "ps" on one of our boxes:
    1003     11941 33.7  2.7 131484 112716 ?       Rl   15:52   2:24 mongrel_rails [10000/1/89]: handling 127.0.0.1: GET /kalender/liste/2007/10/28/1
    1003     11944  1.2  0.8  54336 35580 ?        Sl   15:52   0:05 mongrel_rails [10002/0/1]: idle
    1003     11947  4.3  2.8 135804 116924 ?       Sl   15:52   0:18 mongrel_rails [10008/0/9]: idle
    1003     11950  3.1  0.9  58508 39684 ?        Sl   15:52   0:13 mongrel_rails [10011/0/374]: idle
    1003     11953  3.3  0.9  58196 39428 ?        Sl   15:52   0:14 mongrel_rails [10013/0/370]: idle
    1003     11957  3.5  1.3  74784 55944 ?        Sl   15:52   0:15 mongrel_rails [10001/0/10]: idle
    1003     11961  3.3  0.9  58472 39700 ?        Sl   15:52   0:14 mongrel_rails [10012/0/390]: idle
    1004      5891  2.3  6.8 302544 283032 ?       Rl   15:52   2:24 mongrel_rails [10010/3/26]: handling 127.0.0.1: GET /bulletin/show/26916
    1003     11970 40.7  2.8 138408 119528 ?       Sl   15:52   2:52 mongrel_rails [10004/1/75]: handling 127.0.0.1: GET /
    1003     11974 40.7  5.1 233756 214824 ?       Sl   15:52   2:52 mongrel_rails [10007/2/68]: handling 127.0.0.1: GET /feed/messages/rss/963/2722
    1003     11978 32.1  2.7 133924 115088 ?       Sl   15:52   2:15 mongrel_rails [10009/1/79]: handling 127.0.0.1: GET /kategori/liste/Revival
    1003     11990 28.6  2.9 141688 122916 ?       Sl   15:52   2:00 mongrel_rails [10005/1/85]: handling 127.0.0.1: GET /generelt/search
    1003     11998 27.1  2.8 136816 118020 ?       Sl   15:52   1:53 mongrel_rails [10006/1/78]: handling 127.0.0.1: GET /kalender/liste/2007/9/26
    1003     12002 31.8  2.7 131552 112732 ?       Sl   15:52   2:13 mongrel_rails [10010/0/89]: idle
    
    Mongrel is running with a custom extension I have written that extends the process title with status information. The three numbers are the port, the number of concurrent requests ,and total number of requests processed during the mongrel's lifetime What is apparent from this output is that a bunch of the mongrels are generally not used. This would not be a problem if several other mongrels were not being forced to process multiple concurrent requests. Because of the giant Rails lock, this means certain requests will be queued after other requests, which impairs response time. (We have a lot of fairly slow requests, in the 5-10-second range.)
  19. Gavin Stark said 20 days later:
    @ Alexander: That extension for modifying the mongrel process title looks cool. Would you please share?
  20. Focar said 21 days later:
    Does this work with Visual Basic 6 ?? It'll be awesome!
  21. Alexander Staubo said 23 days later:
    Gavin: Give me a moment and I'll put it up.
  22. Frank K said 25 days later:
    @Alexander: Any resolution to your issues? I would like to run this on my production machine but not until its working properly. Also where are you putting up your modifications to the mongrel process?
  23. Alexander Staubo said 26 days later:

    Here's the extension for Mongrel.

    Frank, the latest snapshot helped. I also modified, on Grzegorz Nosek's advice, FS_TIME_SCALE_OFFSET to a much larger (60000) value. It's running smoother now. I am still seeing some mongrels getting a lot more traffic than they should, while others lie idle, but these mongrels are processing some very, very slow requests, so it looks like a rare edge case that should not affect most sites.

(leave url/email »)

   Preview comment