Fwd: [ module ] Add http upstream keep alive timeout parameter
Wei Xu
weixu365 at gmail.com
Sun Nov 12 12:25:20 UTC 2017
We are running Nginx and upstream on the same machine using docker, so
there's no firewall.
I did a test locally and captured the network packages.
For the normal requests, upstream send a [FIN, ACK] to nginx after
keep-alive timeout (500 ms), and nginx also send a [FIN, ACK] back, then
upstream send a [ACK] to close the connection completely.
1
2
3
4
5
6
7
8
9
10
11
No. Time Source
Destination Protocol Length Info
1 2017-11-12 17:11:04.299146 172.18.0.3
172.18.0.2 TCP 74 48528 → 8000 [SYN] Seq=0 Win=29200
Len=0 MSS=1460 SACK_PERM=1 TSval=32031305 TSecr=0 WS=128
2 2017-11-12 17:11:04.299171 172.18.0.2
172.18.0.3 TCP 74 8000 → 48528 [SYN, ACK] Seq=0 Ack=1
Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=32031305 TSecr=32031305 WS=128
3 2017-11-12 17:11:04.299194 172.18.0.3
172.18.0.2 TCP 66 48528 → 8000 [ACK] Seq=1 Ack=1
Win=29312 Len=0 TSval=32031305 TSecr=32031305
4 2017-11-12 17:11:04.299259 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
5 2017-11-12 17:11:04.299267 172.18.0.2
172.18.0.3 TCP 66 8000 → 48528 [ACK] Seq=1 Ack=176
Win=30080 Len=0 TSval=32031305 TSecr=32031305
6 2017-11-12 17:11:04.299809 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
7 2017-11-12 17:11:04.299852 172.18.0.3
172.18.0.2 TCP 66 48528 → 8000 [ACK] Seq=176 Ack=206
Win=30336 Len=0 TSval=32031305 TSecr=32031305
8 2017-11-12 17:11:04.800805 172.18.0.2
172.18.0.3 TCP 66 8000 → 48528 [FIN, ACK] Seq=206
Ack=176 Win=30080 Len=0 TSval=32031355 TSecr=32031305
9 2017-11-12 17:11:04.801120 172.18.0.3
172.18.0.2 TCP 66 48528 → 8000 [FIN, ACK] Seq=176
Ack=207 Win=30336 Len=0 TSval=32031355 TSecr=32031355
10 2017-11-12 17:11:04.801151 172.18.0.2
172.18.0.3 TCP 66 8000 → 48528 [ACK] Seq=207 Ack=177
Win=30080 Len=0 TSval=32031355 TSecr=32031355
For the failed requests, upstream received a new http request when it had
closed the connection after keep-alive timeout (500 ms) and hasn’t got a
chance to send the [FIN] package. Because of the connection has been closed
from upstream’s perspective, so it send a [RST] response for this request.
1
2
3
4
5
6
7
8
9
10
11
12
13
No. Time Source
Destination Protocol Length Info
433 2017-11-12 17:11:26.548449 172.18.0.3
172.18.0.2 TCP 74 48702 → 8000 [SYN] Seq=0 Win=29200
Len=0 MSS=1460 SACK_PERM=1 TSval=32033530 TSecr=0 WS=128
434 2017-11-12 17:11:26.548476 172.18.0.2
172.18.0.3 TCP 74 8000 → 48702 [SYN, ACK] Seq=0 Ack=1
Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=32033530 TSecr=32033530 WS=128
435 2017-11-12 17:11:26.548502 172.18.0.3
172.18.0.2 TCP 66 48702 → 8000 [ACK] Seq=1 Ack=1
Win=29312 Len=0 TSval=32033530 TSecr=32033530
436 2017-11-12 17:11:26.548609 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
437 2017-11-12 17:11:26.548618 172.18.0.2
172.18.0.3 TCP 66 8000 → 48702 [ACK] Seq=1 Ack=176
Win=30080 Len=0 TSval=32033530 TSecr=32033530
438 2017-11-12 17:11:26.549173 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
439 2017-11-12 17:11:26.549230 172.18.0.3
172.18.0.2 TCP 66 48702 → 8000 [ACK] Seq=176 Ack=206
Win=30336 Len=0 TSval=32033530 TSecr=32033530
440 2017-11-12 17:11:27.049668 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
441 2017-11-12 17:11:27.050324 172.18.0.2
172.18.0.3 HTTP 271 HTTP/1.1 200 OK (text/html)
442 2017-11-12 17:11:27.050378 172.18.0.3
172.18.0.2 TCP 66 48702 → 8000 [ACK] Seq=351 Ack=411
Win=31360 Len=0 TSval=32033580 TSecr=32033580
443 2017-11-12 17:11:27.551182 172.18.0.3
172.18.0.2 HTTP 241 GET /_healthcheck HTTP/1.1
444 2017-11-12 17:11:27.551294 172.18.0.2
172.18.0.3 TCP 66 8000 → 48702 [RST, ACK] Seq=411
Ack=526 Win=32256 Len=0 TSval=32033630 TSecr=32033630
When nginx receives the [RST] package, it will log a ‘Connection reset’
error.
I'm testing by set up the environment:
Upstream (Node.js server):
- Set keep-alive timeout to 500 ms
Test client:
- Keep sending requests with an interval
- Interval starts from 500 ms and decrease 0.1 ms after each request
For more detailed description of the test process, you can reference my
post at:
https://theantway.com/2017/11/analyze-connection-reset-error-in-nginx-upstream-with-keep-alive-enabled/
To Fix the issue, I tried to add a timeout for keep-alived upstream, and
you can check the patch at:
https://github.com/weixu365/nginx/blob/docker-1.13.6/docker/stretch/patches/01-http-upstream-keepalive-timeout.patch
The patch is for my current testing, and I can create a different format if
you need.
Regards
Wei Xu
On Fri, Nov 10, 2017 at 4:07 AM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> Hello!
>
> On Thu, Nov 02, 2017 at 08:41:16PM +1100, Wei Xu wrote:
>
> > Hi
> > I saw there's an issue talking about "implement keepalive timeout for
> > upstream <https://trac.nginx.org/nginx/ticket/1170>".
> >
> > I have a different scenario for this requirement.
> >
> > I'm using Node.js web server as upstream, and set keep alive time out to
> 60
> > second in nodejs server. The problem is I found more than a hundred
> > "Connection reset by peer" errors everyday.
> >
> > Because there's no any errors on nodejs side, I guess it was because of
> the
> > upstream has disconnected, and at the same time, nginx send a new
> request,
> > then received a TCP RST.
>
> Could you please trace what actually happens on the network level
> to confirm the guess is correct?
>
> Also, please check that there are no stateful firewalls between
> nginx and the backend. A firewall which drops the state before
> the timeout expires looks like a much likely cause for such
> errors.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20171112/84ca2146/attachment-0001.html>
More information about the nginx-devel
mailing list