proxied requests hang when DNS response has wrong ident

Pramod Korathota pkorathota at atlassian.com
Tue Jul 15 10:04:44 UTC 2014


We have recently discovered a very rare occurence when requests through
nginx will hang if the resolver sends a response with a mismatching ident.
 We are seeing this in production with 1.7.1 and I have been able to
re-produce with 1.7.3. The relevant parts of the config are:

resolver 10.65.255.4;

location / {
        proxy_pass      http://$host.internal$request_uri;
}

So we basically proxy <customer>.atlassian.net to
<customer>.atlassian.net.internal. The resolver is a pdns recursor running
on the same machine.

The error we see in the logs is:

2014/06/19 20:22:29 [error] 28235#0: wrong ident 57716 response for
customer.atlassian.net.internal, expect 39916
2014/06/19 20:22:29 [error] 28235#0: unexpected response for
customer.atlassian.net.internal
2014/06/19 20:22:59 [error] 28235#0: *23776286
customer.atlassian.net.internal could not be resolved (110: Operation timed
out), client: 83.244.247.165, server: *.atlassian.net, request: "GET
/plugins/ HTTP/1.1", host: "customer.atlassian.net", referrer: "
https://customer.atlassian.net/secure/Dashboard.jspa"

I have been able to re-produce this error in a test environment - this is
what I used:

- a basic python script pretending to be a recursive resolver, which can
mangle the ident of a response. The resolver directive of nginx is pointed
to this recursor. I added in a delay of 100ms before sending a reply (based
on http://code.activestate.com/recipes/491264-mini-fake-dns-server/).
- A proxy configuration same as above - only the resolver and
location/proxy_pass line was added to a default nginx config
- Static webserver as the backend
- GNU parallel + curl to issue concurrent requests

When the ident is correct, the system behaves as expected. However, if an
ident is incorrect, AND nginx gets multiple concurrent (5) requests for
that same backend, we see all the requests hanging. Doing a tcpdump for DNS
traffic shows the first request go out, and the response coming back with
the wrong ident, but no subsequent  dns requests. The critical factor seems
to be multiple incoming requests to nginx, while a dns request is in-flight.

If needed I can provide all the scripts and config I used to produce the
error.

Thanks!

Pramod Korathota
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20140715/fda66a76/attachment.html>


More information about the nginx mailing list