Surviving Digg?

Wed Apr 30 01:07:46 MSD 2008

Hi Neil,

On Die 29.04.2008 13:38, Neil Sheth wrote:
>
>We hit the front page of digg the other night, and our servers didn't
>handle it well at all.  Here's a little of what happened, and perhaps
>someone has some suggestions on what to tweak!
>
>Basic setup, nginx 0.5.35, serving up static image content, and then
>passing php requests to 2 backend servers running apache, all running
>red hat el4.

What was/is the network settings on the maschines?

>Now, we started seeing the following:
> upstream timed out (110: Connection timed out) while connecting to
>upstream

What was the load on the backends?
What are the settings of apache?
Have you take a looke about

netstat -nt

how many FIN* things do you have?

>So, perhaps the 2 backend servers couldn't handle the load?  We were
>serving the page mostly out of memcache at this point.  In any case,
>couldn't figure out why that wasn't sufficient, so we replaced the page
>with a static html one.
>
>This seemed to help, but we were now seeing a lot of these:
>  connect() failed (113: No route to host) while connecting to upstream
>  no live upstreams while connecting to upstream

Have you put names or ip-addresses into the nginx config?

>This wasn't on every request, but a significant percentage.  This, we
>couldn't figure out.  Why couldn't it connect to the backend servers?
>We ended up rebooting both of the backend servers, and these errors
>stopped.

Again load and netstat?!

Cheers

Aleks