<div dir="ltr"><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">'down' should not translate into any kind of attempt, so nothing should really appear for the servers in that static state.<br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">For 'unavailable' servers, for the most part the content of the variables should be the same.<br><br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">Starting from the example I provided, here is what I expected to see:<br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">- <span style="font-family:monospace,monospace">$upstream_addr</span>: <span style="font-family:monospace,monospace"><IP address 

1>:<port>, <IP address 2>:<port>, <IP address 

3>:<port>, <IP address 4>:<port>, <IP address 

5>:<port>, <IP address 6>:<port></span></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">- <span style="font-family:monospace,monospace">$upstream_response_time</span>: <span style="font-family:monospace,monospace">0.000, 0.000, 0.000, 0.000, 0.001, 0.000</span><br><br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">That, associated with the 502 response from the HTTP language, is sufficient to interpret the log entry as: the request failed to find a proper backend after having attempted communication with the 6 specified active backends. It is pretty straightforward.<br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">If you want to add something to explicitely states the whole upstream group is down, this should go to the error log.<br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)">At the very least, if the current way of working is kept, the grammar of the content of the <span style="font-family:monospace,monospace">$upstream_*</span> variables should be specified.<br></div><div class="gmail_default" style="font-size:small;color:rgb(51,51,153)"><br></div><div class="gmail_extra"><div style="font-size:small;color:rgb(51,51,153)" class="gmail_default">Does not that seem reasonable?</div><div><div class="m_2544320185076446131gmail_signature" data-smartmail="gmail_signature"><font size="1"><span style="color:rgb(102,102,102)">---<br></span><b><span style="color:rgb(102,102,102)">B. R.</span></b><span style="color:rgb(102,102,102)"></span></font></div></div>

<br><div class="gmail_quote">On Mon, Apr 17, 2017 at 6:09 PM, Ruslan Ermilov <span dir="ltr"><<a href="mailto:ru@nginx.com" target="_blank">ru@nginx.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Sat, Apr 15, 2017 at 03:55:20AM +0200, B.R. via nginx wrote:<br>

> Let me be clear here:<br>

> I got 6 active servers (not marked down), and the logs show 1 attempt on<br>

> each. They all failed for a known reason, and there is no problem there.<br>

> Subsequently, the whole pool was 'down' and the response was 502.<br>

> Everything perfectly normal so far.<br>

><br>

> What is unclear is the feature (as you classified it) of having a fake<br>

> node named after the pool appearing in the list of tried upstream servers.<br>

> It brings confusion more than anything else: having a 502 response + the<br>

> list of all tried (and failed) nodes corresponding with the list of active<br>

> nodes is more than enough to describe what happened.<br>

> The name of the upstream group does not corresponding to any real asset, it<br>

> is purely virtual classification. It thus makes no sense at all to me to<br>

> have it appearing as a 7th 'node' in the list... and how do you interpret<br>

> its response time (where you got also a 7th item in the list)?<br>

> Moreover, it is confusing, since proxy_pass handles domain names and one<br>

> could believe nginx treated the upstream group name as such.<br>

<br>

</span>Without the six attempts, if all of the servers are unreachable (either<br>

"down" or "unavailable" because they have failed previously) at the time<br>

the request starts, what do you expect to see in $upstream_*?<br>

<div class="m_2544320185076446131HOEnZb"><div class="m_2544320185076446131h5"><br>

> On Fri, Apr 14, 2017 at 10:21 AM, Ruslan Ermilov <<a href="mailto:ru@nginx.com" target="_blank">ru@nginx.com</a>> wrote:<br>

><br>

> > On Fri, Apr 14, 2017 at 09:41:36AM +0200, B.R. via nginx wrote:<br>

> > > Hello,<br>

> > ><br>

> > > Reading from upstream<br>

> > > <<a href="https://nginx.org/en/docs/http/ngx_http_upstream_module.html#upstream" rel="noreferrer" target="_blank">https://nginx.org/en/docs/htt<wbr>p/ngx_http_upstream_module.htm<wbr>l#upstream</a>><br>

> > > docs, on upstream pool exhaustion, every backend should be tried once,<br>

> > and<br>

> > > then if all fail the response should be crafted based on the one from the<br>

> > > last server attempt.<br>

> > > So far so good.<br>

> > ><br>

> > > I recently faced a server farm which implements a dull nightly restart of<br>

> > > every node, not sequencing it, resulting in the possibility of having all<br>

> > > nodes offline at the same time.<br>

> > ><br>

> > > However, I collected log entries which did not match what I was expected.<br>

> > > For 6 backend nodes, I got:<br>

> > > - log format: $status $body_bytes_sent $request_time $upstream_addr<br>

> > > $upstream_response_time<br>

> > > - log entry: <a href="tel:502%20568%200.001" value="+15025680001" target="_blank">502 568 0.001</a> <IP address 1>:<port>, <IP address 2>:<port>,<br>

> > > <IP address 3>:<port>, <IP address 4>:<port>, <IP address 5>:<port>, <IP<br>

> > > address 6>:<port>, php-fpm 0.000, 0.000, 0.000, 0.000, 0.001, 0.000,<br>

> > 0.000<br>

> > > I got 7 entries for $upstream_addr & $upstream_response_time, instead of<br>

> > > the expected 6.<br>

> > ><br>

> > > Here are the interesting parts of the configuration:<br>

> > > upstream php-fpm {<br>

> > >     server <machine 1>:<port> down;<br>

> > >     server <machine 2>:<port> down;<br>

> > >     [...]<br>

> > >     server <machine N-5>:<port>;<br>

> > >     server <machine N-4>:<port>;<br>

> > >     server <machine N-3>:<port>;<br>

> > >     server <machine N-2>:<port>;<br>

> > >     server <machine N-1>:<port>;<br>

> > >     server <machine N>:<port>;<br>

> > >     keepalive 128;<br>

> > > }<br>

> > ><br>

> > > server {<br>

> > >     set $fpm_pool "php-fpm$fpm_pool_ID";<br>

> > >     [...]<br>

> > >         location ~ \.php$ {<br>

> > >             [...]<br>

> > >             fastcgi_read_timeout 600;<br>

> > >             fastcgi_keep_conn on;<br>

> > >             fastcgi_index index.php;<br>

> > ><br>

> > >             include fastcgi_params;<br>

> > >             fastcgi_param SCRIPT_FILENAME<br>

> > > $document_root$fastcgi_script_<wbr>name;<br>

> > >             [...]<br>

> > >             fastcgi_pass $fpm_pool;<br>

> > >         }<br>

> > > }<br>

> > ><br>

> > > The question is:<br>

> > > php-fpm being an upstream group name, how come has it been tried as a<br>

> > > domain name in the end?<br>

> > > Stated otherwise, is this because the upstream group is considered<br>

> > 'down',<br>

> > > thus somehow removed from the possibilities, and nginx trying one last<br>

> > time<br>

> > > the name as a domain name to see if something answers?<br>

> > > This 7th request is definitely strange to my point of view. Is it a bug<br>

> > or<br>

> > > a feature?<br>

> ><br>

> > A feature.<br>

> ><br>

> > Most $upstream_* variables are vectored ones, and the number of entries<br>

> > in their values corresponds to the number of tries made to select a peer.<br>

> > When a peer cannot be selected at all (as in your case), the status is<br>

> > 502 and the name equals the upstream group name.<br>

> ><br>

> > There could be several reasons why none of the peers can be selected.<br>

> > For example, some peers are marked "down", and other peers were failing<br>

> > and are now in the "unavailable" state.<br>

> ><br>

> > The number of tries is limited by the number of servers in the group,<br>

> > unless futher restricted by proxy_next_upstream_tries.  In your case,<br>

> > since there are two "down" servers, and other servers are unavailable,<br>

> > you reach the situation when a peer cannot be selected.  If you comment<br>

> > out the two "down" servers, and try a few requests in a row when all<br>

> > servers are physically unavailable, the first log entry will list all<br>

> > of the attempted servers, and then for the next 10 seconds (in the<br>

> > default config) you'll see only the upstream group name and 502 in<br>

> > $upstream_status, until the servers become available again (see<br>

> > max_fails/fail_timeout).<br>

> ><br>

> > Hope this makes things a little bit clearer.<br>

> > ______________________________<wbr>_________________<br>

> > nginx mailing list<br>

> > <a href="mailto:nginx@nginx.org" target="_blank">nginx@nginx.org</a><br>

> > <a href="http://mailman.nginx.org/mailman/listinfo/nginx" rel="noreferrer" target="_blank">http://mailman.nginx.org/mailm<wbr>an/listinfo/nginx</a><br>

<br>

<br>

</div></div><span class="m_2544320185076446131HOEnZb"><font color="#888888">--<br>

Ruslan Ermilov<br>

Assume stupidity not malice<br>

</font></span><div class="m_2544320185076446131HOEnZb"><div class="m_2544320185076446131h5">______________________________<wbr>_________________<br>

nginx mailing list<br>

<a href="mailto:nginx@nginx.org" target="_blank">nginx@nginx.org</a><br>

<a href="http://mailman.nginx.org/mailman/listinfo/nginx" rel="noreferrer" target="_blank">http://mailman.nginx.org/mailm<wbr>an/listinfo/nginx</a></div></div></blockquote></div><br></div></div>