<div dir="ltr"><div>Hi,</div><div><br></div><div>Really nice, much simpler than my patch. It's great to have a default timeout value. thanks for you time.</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Nov 14, 2017 at 6:49 AM, Maxim Dounin <span dir="ltr"><<a href="mailto:mdounin@mdounin.ru" target="_blank">mdounin@mdounin.ru</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello!<br>
<span class=""><br>
On Sun, Nov 12, 2017 at 11:25:20PM +1100, Wei Xu wrote:<br>
<br>
> We are running Nginx and upstream on the same machine using docker, so<br>
> there's no firewall.<br>
<br>
</span>Note that this isn't usually true.  Docker uses iptables<br>
implicitly, and unless you specifically checked your iptables<br>
configuration - likely you are using firewall.<br>
<span class=""><br>
> I did a test locally and captured the network packages.<br>
><br>
> For the normal requests, upstream send a [FIN, ACK] to nginx after<br>
> keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then<br>
> upstream send a [ACK] to close the connection completely.<br>
<br>
</span>[...]<br>
<span class=""><br>
> For more detailed description of the test process, you can reference my<br>
> post at:<br>
> <a href="https://theantway.com/2017/11/analyze-connection-reset-error-in-nginx-upstream-with-keep-alive-enabled/" rel="noreferrer" target="_blank">https://theantway.com/2017/11/<wbr>analyze-connection-reset-<wbr>error-in-nginx-upstream-with-<wbr>keep-alive-enabled/</a><br>
<br>
</span>The test demonstrates that it is indeed possible to trigger the<br>
problem in question.  Unfortunately, it doesn't provide any proof<br>
that what you observed in production is the same issue though.<br>
<br>
While it is more or less clear that the race condition in question<br>
is real, it seems to be very unlikely with typical workloads.  And<br>
even when triggered, in most cases nginx handles this good enough,<br>
re-trying the request per proxy_next_upstream.<br>
<br>
Nevertheless, thank you for detailed testing.  A simple test case<br>
that reliably demonstrates the race is appreciated, and I was able<br>
to reduce it to your client script and nginx with the following<br>
trivial configuration:<br>
<br>
    upstream u {<br>
        server <a href="http://127.0.0.1:8082" rel="noreferrer" target="_blank">127.0.0.1:8082</a>;<br>
        keepalive 10;<br>
    }<br>
<br>
    server {<br>
        listen 8080;<br>
<br>
        location / {<br>
            proxy_pass <a href="http://u" rel="noreferrer" target="_blank">http://u</a>;<br>
            proxy_http_version 1.1;<br>
            proxy_set_header Connection "";<br>
        }<br>
    }<br>
<br>
    server {<br>
        listen 8082;<br>
<br>
        keepalive_timeout 500ms;<br>
<br>
        location / {<br>
            return 200 ok\n;<br>
<span class="">        }<br>
    }<br>
<br>
> To Fix the issue, I tried to add a timeout for keep-alived upstream, and<br>
> you can check the patch at:<br>
> <a href="https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/stretch/patches/01-http-upstream-keepalive-timeout.patch" rel="noreferrer" target="_blank">https://github.com/weixu365/<wbr>nginx/blob/docker-1.13.6/<wbr>docker/stretch/patches/01-<wbr>http-upstream-keepalive-<wbr>timeout.patch</a><br>
><br>
> The patch is for my current testing, and I can create a different format if<br>
> you need.<br>
<br>
</span>The patch looks good enough for testing, though there are various<br>
minor issues - notably testing timeout for NGX_CONF_UNSET_MSEC at<br>
runtime, using wrong type for timeout during parsing (time_t<br>
instead of ngx_msec_t).<br>
<br>
Also I tend to think that using a separate keepalive_timeout<br>
directive should be easier, and we probably want to introduce some<br>
default value for it.<br>
<br>
Please take a look if the following patch works for you:<br>
<br>
# HG changeset patch<br>
# User Maxim Dounin <<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>><br>
# Date 1510601341 -10800<br>
#      Mon Nov 13 22:29:01 2017 +0300<br>
# Node ID 9ba0a577601b7c1b714eb088bc0b0d<wbr>21c6354699<br>
# Parent  6f592a42570898e1539d2e0b86017f<wbr>32bbf665c8<br>
Upstream keepalive: keepalive_timeout directive.<br>
<br>
The directive configures maximum time a connection can be kept in the<br>
cache.  By configuring a time which is smaller than the corresponding<br>
timeout on the backend side one can avoid the race between closing<br>
a connection by the backend and nginx trying to use the same connection<br>
to send a request at the same time.<br>
<br>
diff --git a/src/http/modules/ngx_http_<wbr>upstream_keepalive_module.c b/src/http/modules/ngx_http_<wbr>upstream_keepalive_module.c<br>
--- a/src/http/modules/ngx_http_<wbr>upstream_keepalive_module.c<br>
+++ b/src/http/modules/ngx_http_<wbr>upstream_keepalive_module.c<br>
@@ -12,6 +12,7 @@<br>
<br>
 typedef struct {<br>
     ngx_uint_t                         max_cached;<br>
+    ngx_msec_t                         timeout;<br>
<br>
     ngx_queue_t                        cache;<br>
     ngx_queue_t                        free;<br>
@@ -84,6 +85,13 @@ static ngx_command_t  ngx_http_upstream_<br>
       0,<br>
       NULL },<br>
<br>
+    { ngx_string("keepalive_timeout"<wbr>),<br>
+      NGX_HTTP_UPS_CONF|NGX_CONF_<wbr>TAKE1,<br>
+      ngx_conf_set_msec_slot,<br>
+      NGX_HTTP_SRV_CONF_OFFSET,<br>
+      offsetof(ngx_http_upstream_<wbr>keepalive_srv_conf_t, timeout),<br>
+      NULL },<br>
+<br>
       ngx_null_command<br>
 };<br>
<br>
@@ -141,6 +149,8 @@ ngx_http_upstream_init_<wbr>keepalive(ngx_con<br>
<br>
     us->peer.init = ngx_http_upstream_init_<wbr>keepalive_peer;<br>
<br>
+    ngx_conf_init_msec_value(kcf-><wbr>timeout, 60000);<br>
+<br>
     /* allocate cache items and add to free queue */<br>
<br>
     cached = ngx_pcalloc(cf->pool,<br>
@@ -261,6 +271,10 @@ found:<br>
     c->write->log = pc->log;<br>
     c->pool->log = pc->log;<br>
<br>
+    if (c->read->timer_set) {<br>
+        ngx_del_timer(c->read);<br>
+    }<br>
+<br>
     pc->connection = c;<br>
     pc->cached = 1;<br>
<br>
@@ -339,9 +353,8 @@ ngx_http_upstream_free_<wbr>keepalive_peer(ng<br>
<br>
     pc->connection = NULL;<br>
<br>
-    if (c->read->timer_set) {<br>
-        ngx_del_timer(c->read);<br>
-    }<br>
+    ngx_add_timer(c->read, kp->conf->timeout);<br>
+<br>
     if (c->write->timer_set) {<br>
         ngx_del_timer(c->write);<br>
     }<br>
@@ -392,7 +405,7 @@ ngx_http_upstream_keepalive_<wbr>close_handle<br>
<br>
     c = ev->data;<br>
<br>
-    if (c->close) {<br>
+    if (c->close || c->read->timedout) {<br>
         goto close;<br>
     }<br>
<br>
@@ -485,6 +498,8 @@ ngx_http_upstream_keepalive_<wbr>create_conf(<br>
      *     conf->max_cached = 0;<br>
      */<br>
<br>
+    conf->timeout = NGX_CONF_UNSET_MSEC;<br>
+<br>
     return conf;<br>
<div class="HOEnZb"><div class="h5"> }<br>
<br>
<br>
--<br>
Maxim Dounin<br>
<a href="http://mdounin.ru/" rel="noreferrer" target="_blank">http://mdounin.ru/</a><br>
______________________________<wbr>_________________<br>
nginx-devel mailing list<br>
<a href="mailto:nginx-devel@nginx.org">nginx-devel@nginx.org</a><br>
<a href="http://mailman.nginx.org/mailman/listinfo/nginx-devel" rel="noreferrer" target="_blank">http://mailman.nginx.org/<wbr>mailman/listinfo/nginx-devel</a><br>
</div></div></blockquote></div><br></div></div>