Fwd: [ module ] Add http upstream keep alive timeout parameter

Wei Xu weixu365 at gmail.com
Wed Nov 22 06:31:25 UTC 2017


Hi,

Is there any place to view the status of current proposed patches? I'm not
sure if this patch had been accepted, still waiting or rejected?

In order to avoid errors in production, I'm running the patched version
now. But I think it would be better to run the official one, and also I can
introduce this solution for 'Connection reset by peer errors' to other
teams.

On Tue, Nov 14, 2017 at 2:03 PM, Wei Xu <weixu365 at gmail.com> wrote:

> Hi,
>
> Really nice, much simpler than my patch. It's great to have a default
> timeout value. thanks for you time.
>
>
>
> On Tue, Nov 14, 2017 at 6:49 AM, Maxim Dounin <mdounin at mdounin.ru> wrote:
>
>> Hello!
>>
>> On Sun, Nov 12, 2017 at 11:25:20PM +1100, Wei Xu wrote:
>>
>> > We are running Nginx and upstream on the same machine using docker, so
>> > there's no firewall.
>>
>> Note that this isn't usually true.  Docker uses iptables
>> implicitly, and unless you specifically checked your iptables
>> configuration - likely you are using firewall.
>>
>> > I did a test locally and captured the network packages.
>> >
>> > For the normal requests, upstream send a [FIN, ACK] to nginx after
>> > keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
>> > upstream send a [ACK] to close the connection completely.
>>
>> [...]
>>
>> > For more detailed description of the test process, you can reference my
>> > post at:
>> > https://theantway.com/2017/11/analyze-connection-reset-error
>> -in-nginx-upstream-with-keep-alive-enabled/
>>
>> The test demonstrates that it is indeed possible to trigger the
>> problem in question.  Unfortunately, it doesn't provide any proof
>> that what you observed in production is the same issue though.
>>
>> While it is more or less clear that the race condition in question
>> is real, it seems to be very unlikely with typical workloads.  And
>> even when triggered, in most cases nginx handles this good enough,
>> re-trying the request per proxy_next_upstream.
>>
>> Nevertheless, thank you for detailed testing.  A simple test case
>> that reliably demonstrates the race is appreciated, and I was able
>> to reduce it to your client script and nginx with the following
>> trivial configuration:
>>
>>     upstream u {
>>         server 127.0.0.1:8082;
>>         keepalive 10;
>>     }
>>
>>     server {
>>         listen 8080;
>>
>>         location / {
>>             proxy_pass http://u;
>>             proxy_http_version 1.1;
>>             proxy_set_header Connection "";
>>         }
>>     }
>>
>>     server {
>>         listen 8082;
>>
>>         keepalive_timeout 500ms;
>>
>>         location / {
>>             return 200 ok\n;
>>         }
>>     }
>>
>> > To Fix the issue, I tried to add a timeout for keep-alived upstream, and
>> > you can check the patch at:
>> > https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/
>> stretch/patches/01-http-upstream-keepalive-timeout.patch
>> >
>> > The patch is for my current testing, and I can create a different
>> format if
>> > you need.
>>
>> The patch looks good enough for testing, though there are various
>> minor issues - notably testing timeout for NGX_CONF_UNSET_MSEC at
>> runtime, using wrong type for timeout during parsing (time_t
>> instead of ngx_msec_t).
>>
>> Also I tend to think that using a separate keepalive_timeout
>> directive should be easier, and we probably want to introduce some
>> default value for it.
>>
>> Please take a look if the following patch works for you:
>>
>> # HG changeset patch
>> # User Maxim Dounin <mdounin at mdounin.ru>
>> # Date 1510601341 -10800
>> #      Mon Nov 13 22:29:01 2017 +0300
>> # Node ID 9ba0a577601b7c1b714eb088bc0b0d21c6354699
>> # Parent  6f592a42570898e1539d2e0b86017f32bbf665c8
>> Upstream keepalive: keepalive_timeout directive.
>>
>> The directive configures maximum time a connection can be kept in the
>> cache.  By configuring a time which is smaller than the corresponding
>> timeout on the backend side one can avoid the race between closing
>> a connection by the backend and nginx trying to use the same connection
>> to send a request at the same time.
>>
>> diff --git a/src/http/modules/ngx_http_upstream_keepalive_module.c
>> b/src/http/modules/ngx_http_upstream_keepalive_module.c
>> --- a/src/http/modules/ngx_http_upstream_keepalive_module.c
>> +++ b/src/http/modules/ngx_http_upstream_keepalive_module.c
>> @@ -12,6 +12,7 @@
>>
>>  typedef struct {
>>      ngx_uint_t                         max_cached;
>> +    ngx_msec_t                         timeout;
>>
>>      ngx_queue_t                        cache;
>>      ngx_queue_t                        free;
>> @@ -84,6 +85,13 @@ static ngx_command_t  ngx_http_upstream_
>>        0,
>>        NULL },
>>
>> +    { ngx_string("keepalive_timeout"),
>> +      NGX_HTTP_UPS_CONF|NGX_CONF_TAKE1,
>> +      ngx_conf_set_msec_slot,
>> +      NGX_HTTP_SRV_CONF_OFFSET,
>> +      offsetof(ngx_http_upstream_keepalive_srv_conf_t, timeout),
>> +      NULL },
>> +
>>        ngx_null_command
>>  };
>>
>> @@ -141,6 +149,8 @@ ngx_http_upstream_init_keepalive(ngx_con
>>
>>      us->peer.init = ngx_http_upstream_init_keepalive_peer;
>>
>> +    ngx_conf_init_msec_value(kcf->timeout, 60000);
>> +
>>      /* allocate cache items and add to free queue */
>>
>>      cached = ngx_pcalloc(cf->pool,
>> @@ -261,6 +271,10 @@ found:
>>      c->write->log = pc->log;
>>      c->pool->log = pc->log;
>>
>> +    if (c->read->timer_set) {
>> +        ngx_del_timer(c->read);
>> +    }
>> +
>>      pc->connection = c;
>>      pc->cached = 1;
>>
>> @@ -339,9 +353,8 @@ ngx_http_upstream_free_keepalive_peer(ng
>>
>>      pc->connection = NULL;
>>
>> -    if (c->read->timer_set) {
>> -        ngx_del_timer(c->read);
>> -    }
>> +    ngx_add_timer(c->read, kp->conf->timeout);
>> +
>>      if (c->write->timer_set) {
>>          ngx_del_timer(c->write);
>>      }
>> @@ -392,7 +405,7 @@ ngx_http_upstream_keepalive_close_handle
>>
>>      c = ev->data;
>>
>> -    if (c->close) {
>> +    if (c->close || c->read->timedout) {
>>          goto close;
>>      }
>>
>> @@ -485,6 +498,8 @@ ngx_http_upstream_keepalive_create_conf(
>>       *     conf->max_cached = 0;
>>       */
>>
>> +    conf->timeout = NGX_CONF_UNSET_MSEC;
>> +
>>      return conf;
>>  }
>>
>>
>> --
>> Maxim Dounin
>> http://mdounin.ru/
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20171122/93c390bc/attachment.html>


More information about the nginx-devel mailing list