Surviving Digg?

Wed Apr 30 13:25:11 MSD 2008

If using linux.
Put the following line (WITHOUT quotes)

"*               hard    nofile          8024"

in the /etc/security/limits.conf

and reboot the server. - (Of course you can do it without rebooting).

Or, put the following in nginx init file (like, /etc/init.d/nginx) before the daemon start line.. in start function.
ulimit -n 8024
and just restart the nginx server.

That will solve the problem. But beware. Your limit now is 8000 of open files on system.
Google it and tweak it if needed.

Kind Regards,
Sasa Ugrenovic

On Wed, 30 Apr 2008 11:08:52 +0400
Igor Sysoev <is at rambler-co.ru> wrote:

> On Tue, Apr 29, 2008 at 01:38:13PM -0700, Neil Sheth wrote:
> 
> > We hit the front page of digg the other night, and our servers didn't
> > handle it well at all.  Here's a little of what happened, and perhaps
> > someone has some suggestions on what to tweak!
> > 
> > Basic setup, nginx 0.5.35, serving up static image content, and then
> > passing php requests to 2 backend servers running apache, all running
> > red hat el4.
> > 
> > Looking at the nginx error log -
> > 
> > First, we saw a lot of entries like the following:
> >  socket() failed (24: Too many open files) while connecting to upstream
> >  accept() failed (24: Too many open files) while accepting new connection
> >  open() "/var/www/html/images/imagefile.jpg" failed (24: Too many open files)
> > 
> > Running ulimit -n showed 1024, so set that to 32768 on all 3 servers.
> > Also raised limit in /etc/security/limits.conf.
> 
> You need to tune your OS: to increase number of files, sockets, etc.
> I can not say about Linux, but here is my tunning for FreeBSD/amd64, 4G
> for large number of sockets/etc:
> http://lists.freebsd.org/pipermail/freebsd-net/2008-April/017737.html
> 
> > Now, we started seeing the following:
> >  upstream timed out (110: Connection timed out) while connecting to upstream
> > 
> > So, perhaps the 2 backend servers couldn't handle the load?  We were
> > serving the page mostly out of memcache at this point.  In any case,
> > couldn't figure out why that wasn't sufficient, so we replaced the
> > page with a static html one.
> 
> Yes, it seems that your backend can not handle load.
> 
> > This seemed to help, but we were now seeing a lot of these:
> >   connect() failed (113: No route to host) while connecting to upstream
> >   no live upstreams while connecting to upstream
> > 
> > This wasn't on every request, but a significant percentage.  This, we
> > couldn't figure out.  Why couldn't it connect to the backend servers?
> > We ended up rebooting both of the backend servers, and these errors
> > stopped.
> > 
> > Any thoughts / comments anyone has?  Thanks!
> 
> The "113: No route to host" is network error, it might be appeared while
> backend rebooting.
> 
> 
> -- 
> Igor Sysoev
> http://sysoev.ru/en/
> 

-- 
Sasa Ugrenovic <sasa at infomedia.ba>