Upstream failures against Mongrels instances

Mon Aug 13 22:16:46 MSD 2007

Hi~

On Aug 13, 2007, at 3:52 AM, Alexander Staubo wrote:

> We are running Nginx against a bunch of Mongrel instances. We kill and
> respawn these fairly frequently (in order to avoid some nasty
> third-party library memory leaks that we have not been able to hunt
> down), and these restarts seem to coincide with upstream errors in
> Nginx:
>
> 2007/08/13 12:45:32 [error] 16165#0: *347414 upstream timed out (110:
> Connection timed out) while reading response header from upstream ...
>
> I have set the proxy to pick another upstream on errors -- which
> should work as the restarting is staggered to make sure some instances
> are always available -- but this seems to have no effect:
>
>   proxy_next_upstream error timeout http_500 invalid_header
>
> Looks to me like Nginx may be trying to use the TCP connections to the
> Mongrel instances that have terminated, and doesn't immediately create
> a new connection when the old one fails.
>
> This is Nginx 0.4.13-3, Ubuntu's current stable version.
>
> Alexander.
>

Alexander-

	When you say you kill and respawn the mongrels, do you mean you kill  
-9 them and then restart? Or do you use mongrel's graceful stop  
feature? I would recommend using monit to monitor your mongrels and  
let it restart them individually when they use too much memory.

	Here is a monit snippet for one mongrel. You would make an entry  
like this for each one you need. Make sure you have the latest  
version of mongrel_cluster gem installed. ANd make sure you use  
absolute paths to the log and pid files in your mongrel_cluster.yml

check process mongrel_username_5000
   with pidfile /var/log/engineyard/mongrel/username/mongrel.5000.pid
   start program = "/usr/bin/mongrel_rails cluster::start -C /data/ 
username/current/config/mongrel_cluster.yml --clean --only 5000" as  
uid username and gid username
   stop program = "/usr/bin/mongrel_rails cluster::stop -C /data/ 
username/current/config/mongrel_cluster.yml --clean --only 5000" as  
uid username and gid username
   if totalmem is greater than 110.0 MB for 2 cycles then  
restart      # eating up memory?
   if loadavg(5min) greater than 10 for 8 cycles then  
restart          # bad, bad, bad
   if 20 restarts within 20 cycles then  
timeout                        # something is wrong, call the sys-admin
   group mongrel

	Using a setup like this your mongrels can be restarted gracefully  
one at a time when they exceed memeory limits. This also makes it so  
you don't get upstream errors from nginx.

Cheers-
-- Ezra Zygmuntowicz 
-- Founder & Ruby Hacker
-- ez at engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)