Tomcat cluster behind nginx: avoiding delays while restarting tomcat
jman at ablesky.com
Wed May 27 11:59:36 MSD 2009
We run a tomcat cluster behind nginx with the upstream module. Nginx
fits our requirements well for load balancing and failover, except for
in one case.
When starting or restarting tomcat, our web application takes a couple
of minutes to initialize, during which time the tomcat connector is
listening on TCP 8080, but the application isn't ready to process requests.
The nginx documentation instructs that a host (in planned down time)
should be marked as 'down' during this time, and this is a partial
solution to our problem.
Since we're still small, our developers do the application deployment
themselves. The deploy process is quite informal and is performed
manually right now.
Because our developers are primarily Windows users who spend most of
their time in Eclipse, and because they don't have a full understanding
of the systems, they tend to make mistakes when editing config files in
UNIX and when restarting/reloading servers. Because of this, I would
like to find the best solution for automating the deploy process,
beginning with this small part.
If the tomcat connector could be told not to start listening on its TCP
port until the app is finished initializing, then I would be tempted to
let the upstream module's failover mechanism take care of everything
(comments on the wiseness or stupidity of succumbing to this temptation
are welcome). However, I haven't seen any way to accomplish this.
I also don't see any mechanism in the upstream module to help with this,
and the upstream module doesn't seem to consider a tomcat that is
accepting TCP connections but that isn't answering requests to be failed.
This leads me to think that the best way to automate web app deployment
is to either:
- Write a script to edit nginx.conf, mark the tomcat node as 'down', and
- Or, write a script to run on the tomcat server using iptables to
REJECT connections to TCP 8080 until the app is finished initializing.
Either of these could be built into an automated deployment process that
would save manual labor and the associated human error.
I would appreciate hearing how others have solved this problem, whether
the above ideas are reasonable, and whether there is a standard solution
I haven't heard of. If it seems useful, I'll be happy to post details
about our solution once it is implemented and tested.
More information about the nginx