robm at fastmail.fm
Mon Apr 7 10:23:15 MSD 2008
> IMHO, this is overkill. It's really neat, but I don't think you need
> to do this at all. We host lots of rails apps and don't run into
> problems that require that kind of approach. You'll get error log
I'm confused. You previously said:
> Won't this have the downside of possibly sending multiple failing
> requests to the upstreams? We used this for a while but ran into
> problems with duplicate requests. For example we had people sending
> WAY too many mails out in an request, the appserver would timeout
> halfway through, it'd send a portion of the emails, and then send the
> request to another upstream. The subsequent requests would do the
> same thing and people would get the same email for every upstream
I then looked at the docs.
error - an error has occurred while connecting to the server, sending a
request to it, or reading its response;
timeout - occurred timeout during the connection with the server, transfer
the requst or while reading response from the server;
And assumed that a "timeout" was a subset of "error". Is that right or wrong
then? If I do:
And one of my connections times out, will nginx send the request to the next
backend or not? If it does, then that's a problem because it can launch the
same "slow" action to occur multiple times on multiple servers. It means
that we do need a "connect_error" option so we can just say:
If not, then we're all ok, we can just use the "error" option.
Anyway, having said all that, we still do need our solution for some
annoying edge cases. Basically systems can crash in very, very odd ways.
It's been a while (I think it was linux 2.6.18), but we had a system crash
in a state where it would accept TCP connections, but wasn't responding to
them in any way. That was quite nasty because basically it meant connections
coming in to that server would have to wait the full proxy_read_timeout
before being passed to the next backend server. Since the server was remote,
it took a little while to get it rebooted at the co-location facility.
Fortunately, because of our above scheme, and the fact we remotely check
each server every 2 minutes, when that server failed to pass it's "ping"
test after 30 seconds, it was marked down in the database, and was
automatically taken out of service without intervention required by us.
More information about the nginx