Nginx upstream delays

Tue Mar 10 09:23:52 UTC 2015

Hello, 

The $request_time and $upstream_request_time are already included. In the log below it says ' 3.042 3.098 '. The latter is the request time, the former the upstream request time. It doesn't seem to be an issue of slow clients (also not for other log entries, they're similar). 

It's going to be hard to diagnose if Gunicorn says the request takes 40 ms, but Nginx says it takes 3042 ms. 

Hunting on... 

- Wiebe 

----- Original Message -----

> From: "Wandenberg Peixoto" <wandenberg at gmail.com>
> To: nginx at nginx.org
> Sent: Tuesday, 10 March, 2015 1:37:09 AM
> Subject: Re: Nginx upstream delays

> You also have to consider the rate your client get data from the server.
> The request time is the entire time spent from the beginning of the request
> until the end of response.
> So you may not have a problem on your server, just a lazy client :)

> On Mon, Mar 9, 2015 at 1:05 PM, B.R. < reallfqq-nginx at yahoo.fr > wrote:

> > You are on a nginx mailing list, thus I will reply on the nginx side of the
> > problem.
> 

> > You can diagnose further to tell if the problem comes from nginx or from
> > the
> > backend by using 2 different variables in your log message:
> 
> > $request_time
> 
> > $upstream_response_time
> 

> > If those values are close enough (most of the time equal), you might then
> > conclude that the trouble does not come from nginx, but rather from the
> > backend (or communication between those).
> 

> > If you want to investigate the communication level, you can set up some
> > tcpdump listening on the communication between nginx and the backend. You
> > will need to use TCP ports to do that.
> 
> > Since you are using UNIX sockets, you might want to monitor file
> > descriptors,
> > but I would (totally out of thin air) suppose it might not be the source of
> > your trouble, since you would have seen much more impact if it was.
> 

> > I guess you will have to trace/dump stuff on your backend. PHP has some
> > slowlog capability firing up tracing in a code which takes too long to
> > finish. I do not know anything about Python servers, but you are not at the
> > right location for questions related to those anyway.
> 

> > Happy hunting,
> 
> > ---
> 
> > B. R.
> 

> > On Mon, Mar 9, 2015 at 4:28 PM, Wiebe Cazemier < wiebe at halfgaar.net >
> > wrote:
> 

> > > Hello,
> > 
> 

> > > I have a question about sporadic long upstream response times I'm seeing
> > > on
> > > (two of) our Nginx servers. It's kind of hard to show and quantify, but
> > > I'll
> > > do my best.
> > 
> 

> > > One is a Django Gunicorn server. We included the upstream response time
> > > in
> > > the Nginx access log and wrote a script to analyze them. What we see, is
> > > that on the login page of a website (a page that does almost nothing)
> > > 95%-99% of 'GET /customer/login/' requests are processed within about 50
> > > ms.
> > > The other few percent can take several seconds. Sometimes even 5s. Our
> > > Munin
> > > graphs show no correlation in disk latency, cpu time, memory use, etc.
> > 
> 

> > > I also added an access log to Gunicorn, so that I can see how long
> > > Gunicorn
> > > takes to process requests that Nginx thinks take long. Gunicorn has 8
> > > workers. It can be seen that there is actually no delay in Gunicorn. For
> > > instance, Nginx sees this (the long upstream response time is marked red,
> > > 3.042s):
> > 
> 

> > > > 11.22.33.44 - - [06/Mar/2015:10:27:46 +0100] "GET /customer/login/
> > > > HTTP/1.1"
> > > > 200 8310 0.061 0.121 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:48 +0100] "GET /customer/login/
> > > > HTTP/1.1"
> > > > 200 8310 0.035 0.092 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [ 06/Mar/2015:10:27:52 +0100] "GET /customer/login/
> > > > HTTP/1.1"
> > > > 200 8310 3.042 3.098 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:53 +0100] "GET /customer/login/
> > > > HTTP/1.1"
> > > > 200 8310 0.051 0.108 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:54 +0100] "GET /customer/login/
> > > > HTTP/1.1"
> > > > 200 8310 0.038 0.096 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > x.x.x.x - - [06/Mar/2015:10:27:58 +0100] "POST
> > > > /customer/login/?next=/customer/home/ HTTP/1.1" 302 5 0.123 0.123
> > > 
> > 
> 

> > > But then the corresponding Gunicorn logs shows normal response times (the
> > > figure after 'None', in µs) (Corresponding line marked blue):
> > 
> 

> > > > 11.22.33.44 - - [06/Mar/2015:10:27:41] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 41686 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:42] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 27629 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:43] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 28143 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:44] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 41846 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:45] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 30192 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:46] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 59382 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:48] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 33308 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [ 06/Mar/2015:10:27:52 ] "GET /customer/login/
> > > > HTTP/1.0"
> > > > 200
> > > > None 39849 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:53] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 48321 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:27:54] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 36484 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > x.x.x.x - - [06/Mar/2015:10:27:58] "POST
> > > > /customer/login/?next=/customer/home/ HTTP/1.0" 302 None 122295
> > > 
> > 
> 
> > > > y.y.y.y - - [06/Mar/2015:10:28:02] "GET
> > > > /customer/login/?next=/customer/home/
> > > > HTTP/1.0" 200 None 97824
> > > 
> > 
> 
> > > > y.y.y.y - - [06/Mar/2015:10:28:03] "GET
> > > > /customer/login/?next=/customer/home/
> > > > HTTP/1.0" 200 None 78162
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:28:26] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 38350 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:28:27] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 31076 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:28:28] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 28536 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:28:30] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 30981 "-" "Echoping/6.0.2"
> > > 
> > 
> 
> > > > 11.22.33.44 - - [06/Mar/2015:10:28:31] "GET /customer/login/ HTTP/1.0"
> > > > 200
> > > > None 29920 "-" "Echoping/6.0.2"
> > > 
> > 
> 

> > > As I said, there are currently 8 workers. I already increased them from
> > > 4.
> > > The log above shows that there are enough seconds between each request
> > > that
> > > 8 workers should be able to handle it. I also created a MySQL slow log,
> > > which doesn't show the delays. MySQL is always fast.
> > 
> 

> > > Another server we have is Nginx with PHP-FPM (with 150 PHP children in
> > > the
> > > pool), no database access. On one particular recent log of a few hundred
> > > thousand entries, 99% of requests is done in 129ms. But one response even
> > > took 3170ms. Its PHP proxy settings are:
> > 
> 

> > > > location ~ \.php$ {
> > > 
> > 
> 
> > > > fastcgi_split_path_info ^(.+\.php)(/.+)$;
> > > 
> > 
> 
> > > > # NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
> > > 
> > 
> 

> > > > # With php5-fpm:
> > > 
> > 
> 
> > > > fastcgi_pass unix:/var/run/php5-fpm.sock;
> > > 
> > 
> 
> > > > fastcgi_index index.php;
> > > 
> > 
> 
> > > > include fastcgi_params;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > It seems something in the communication between Nginx and the service
> > > behind
> > > it slows down sometimes, but I can't figure out what it might be. Any
> > > idea
> > > what it might be or how to diagnose it better?
> > 
> 

> > > Regards,
> > 
> 

> > > Wiebe Cazemier
> > 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20150310/94a8ac2b/attachment.html>