<div dir="ltr"><div>We have recently discovered a very rare occurence when requests through nginx will hang if the resolver sends a response with a mismatching ident. We are seeing this in production with 1.7.1 and I have been able to re-produce with 1.7.3. The relevant parts of the config are:</div>
<div><br></div><div>resolver 10.65.255.4;</div><div><br></div><div>location / {</div><div> proxy_pass http://$host.internal$request_uri;</div><div>}</div><div><br></div><div>So we basically proxy <customer>.<a href="http://atlassian.net">atlassian.net</a> to <customer>.atlassian.net.internal. The resolver is a pdns recursor running on the same machine.</div>
<div><br></div><div>The error we see in the logs is:</div><div><br></div><div>2014/06/19 20:22:29 [error] 28235#0: wrong ident 57716 response for customer.atlassian.net.internal, expect 39916</div><div>2014/06/19 20:22:29 [error] 28235#0: unexpected response for customer.atlassian.net.internal</div>
<div>2014/06/19 20:22:59 [error] 28235#0: *23776286 customer.atlassian.net.internal could not be resolved (110: Operation timed out), client: 83.244.247.165, server: *.<a href="http://atlassian.net">atlassian.net</a>, request: "GET /plugins/ HTTP/1.1", host: "<a href="http://customer.atlassian.net">customer.atlassian.net</a>", referrer: "<a href="https://customer.atlassian.net/secure/Dashboard.jspa">https://customer.atlassian.net/secure/Dashboard.jspa</a>"</div>
<div><br></div><div>I have been able to re-produce this error in a test environment - this is what I used:</div><div><br></div><div>- a basic python script pretending to be a recursive resolver, which can mangle the ident of a response. The resolver directive of nginx is pointed to this recursor. I added in a delay of 100ms before sending a reply (based on <a href="http://code.activestate.com/recipes/491264-mini-fake-dns-server/">http://code.activestate.com/recipes/491264-mini-fake-dns-server/</a>).</div>
<div>- A proxy configuration same as above - only the resolver and location/proxy_pass line was added to a default nginx config</div><div>- Static webserver as the backend</div><div>- GNU parallel + curl to issue concurrent requests</div>
<div><br></div><div>When the ident is correct, the system behaves as expected. However, if an ident is incorrect, AND nginx gets multiple concurrent (5) requests for that same backend, we see all the requests hanging. Doing a tcpdump for DNS traffic shows the first request go out, and the response coming back with the wrong ident, but no subsequent dns requests. The critical factor seems to be multiple incoming requests to nginx, while a dns request is in-flight.</div>
<div><br></div><div>If needed I can provide all the scripts and config I used to produce the error.</div><div><br></div><div>Thanks!</div><div><br></div><div>Pramod Korathota</div></div>