Tomcat cluster behind nginx: avoiding delays while restarting tomcat

Wed May 27 11:59:36 MSD 2009

We run a tomcat cluster behind nginx with the upstream module.  Nginx 
fits our requirements well for load balancing and failover, except for 
in one case.

When starting or restarting tomcat, our web application takes a couple 
of minutes to initialize, during which time the tomcat connector is 
listening on TCP 8080, but the application isn't ready to process requests.

The nginx documentation instructs that a host (in planned down time) 
should be marked as 'down' during this time, and this is a partial 
solution to our problem.

Since we're still small, our developers do the application deployment 
themselves.  The deploy process is quite informal and is performed 
manually right now.

Because our developers are primarily Windows users who spend most of 
their time in Eclipse, and because they don't have a full understanding 
of the systems, they tend to make mistakes when editing config files in 
UNIX and when restarting/reloading servers.  Because of this, I would 
like to find the best solution for automating the deploy process, 
beginning with this small part.

If the tomcat connector could be told not to start listening on its TCP 
port until the app is finished initializing, then I would be tempted to 
let the upstream module's failover mechanism take care of everything 
(comments on the wiseness or stupidity of succumbing to this temptation 
are welcome).  However, I haven't seen any way to accomplish this.

I also don't see any mechanism in the upstream module to help with this, 
and the upstream module doesn't seem to consider a tomcat that is 
accepting TCP connections but that isn't answering requests to be failed.

This leads me to think that the best way to automate web app deployment 
is to either:

- Write a script to edit nginx.conf, mark the tomcat node as 'down', and 
reload nginx;

- Or, write a script to run on the tomcat server using iptables to 
REJECT connections to TCP 8080 until the app is finished initializing.

Either of these could be built into an automated deployment process that 
would save manual labor and the associated human error.

I would appreciate hearing how others have solved this problem, whether 
the above ideas are reasonable, and whether there is a standard solution 
I haven't heard of.  If it seems useful, I'll be happy to post details 
about our solution once it is implemented and tested.

	John