Nginx session-stickiness

Mon Apr 7 10:23:15 MSD 2008

> IMHO, this is overkill.  It's really neat, but I don't think you need
> to do this at all.  We host lots of rails apps and don't run into
> problems that require that kind of approach.  You'll get error log

I'm confused. You previously said:

> Won't this have the downside of possibly sending multiple failing
> requests to the upstreams?  We used this for a while but ran into
> problems with duplicate requests.  For example we had people sending
> WAY too many mails out in an request, the appserver would timeout
> halfway through, it'd send a portion of the emails, and then send the
> request to another upstream.  The subsequent requests would do the
> same thing and people would get the same email for every upstream
> defined.

I then looked at the docs.

---
http://wiki.codemongers.com/NginxHttpProxyModule#proxy_next_upstream

error - an error has occurred while connecting to the server, sending a 
request to it, or reading its response;
timeout - occurred timeout during the connection with the server, transfer 
the requst or while reading response from the server;
---

And assumed that a "timeout" was a subset of "error". Is that right or wrong 
then? If I do:

proxy_next_upstream error;

And one of my connections times out, will nginx send the request to the next 
backend or not? If it does, then that's a problem because it can launch the 
same "slow" action to occur multiple times on multiple servers. It means 
that we do need a "connect_error" option so we can just say:

proxy_next_upstream connect_error;

If not, then we're all ok, we can just use the "error" option.

Anyway, having said all that, we still do need our solution for some 
annoying edge cases. Basically systems can crash in very, very odd ways. 
It's been a while (I think it was linux 2.6.18), but we had a system crash 
in a state where it would accept TCP connections, but wasn't responding to 
them in any way. That was quite nasty because basically it meant connections 
coming in to that server would have to wait the full proxy_read_timeout 
before being passed to the next backend server. Since the server was remote, 
it took a little while to get it rebooted at the co-location facility.

Fortunately, because of our above scheme, and the fact we remotely check 
each server every 2 minutes, when that server failed to pass it's "ping" 
test after 30 seconds, it was marked down in the database, and was 
automatically taken out of service without intervention required by us.

Rob