Nginx + Rails (Thin Server) - Double Bad Gateway Problem

djlebersilvestre nginx-forum at nginx.us
Wed Mar 28 13:26:38 UTC 2012


Hi,

I'm facing a strange problem using Nginx with reverse proxy to a ruby
Thin Server. 
When someone calls an unrouteable path in our Rails App, Nginx launches
a 502 - Bad Gateway correctly. 
When Nginx receives another resquest (just after the 502), that is valid
and routeable, it launches directly again a 502 - Bad Gateway without
passing the request to the Thin Server (we can check this in the nginx
logs and the app logs). 
When I call the same valid URL again, the Nginx passes the call to the
Thin Server and it returns to normal operation.

It seems to cache something in Nginx. When this happens and only in the
third call it gets back to normal track. 
I've researched the Internet for this issue but nothing conclusive... 
As a workaround, I've changed the configuration disabiling the
next_upstream. 
This resolved my problem, but I've lost the next_upstream feature that
calls the others servers when one of them fail...


Here is our nginx configuration:
---------------------------------------------------------------------------
upstream reg-prov {
        server 127.0.0.1:5010;
        server 127.0.0.1:5011;
        server 127.0.0.1:5012;
        server 127.0.0.1:5013;
}

server {
        listen                  443;
        ssl                     on;
        ssl_certificate            /etc/nginx/conf.d/server.crt;
        ssl_certificate_key     /etc/nginx/conf.d/server.key;

        server_name    
reg-prov.registro.systemintegration.locaweb.com.br;
        access_log      /var/log/nginx/reg-prov.access.log;
        root            /var/www/reg-prov/public;

        location / {
                proxy_set_header X-Real-IP                 
$remote_addr;
                proxy_set_header X-Forwarded-For       
$proxy_add_x_forwarded_for;
                proxy_set_header Host                        
$http_host;
                proxy_set_header X-Forwarded-Proto     https;
                proxy_redirect                                     
off;
                proxy_next_upstream                           off;
                if (-f $request_filename/index.html) {
                        rewrite (.*) $1/index.html break;
                }
                if (-f $request_filename.html) {
                        rewrite (.*) $1.html break;
                }
                if (!-f $request_filename) {
                        proxy_pass http://reg-prov;
                        break;
                }
        }
}
---------------------------------------------------------------------------
Nginx version: 0.7.67-3+squeeze2




Here is the behavior I described (log files).

With the next upstream DISABLED (the workaround):
---------------------------------------------------------------------------
********************
* Nginx Access Log *
********************
>>> First Call (invalid)
187.45.255.164 - - [28/Mar/2012:10:14:11 -0300] "GET
/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
HTTP/1.1" 502 173 "-" "Ruby"

>>> Second Call (valid)
187.45.255.164 - - [28/Mar/2012:10:14:21 -0300] "GET
/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
HTTP/1.1" 200 125 "-" "Ruby"


***********
* App Log *
***********
>>> First Call (invalid)
[2012-03-28 10:14:11] INFO -- : Started GET
"/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:14:11 -0300

>>> Second Call (valid)
[2012-03-28 10:14:21] INFO -- : Started GET
"/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:14:21 -0300
[2012-03-28 10:14:21] INFO -- : Processing by
DomainAvailabilityController#verisign as XML
[2012-03-28 10:14:21] INFO -- : Parameters:
{"ticket"=>"ST-28035-WPQ4MirFJIKxwagfIIUe-cas", "id"=>"disponivel.com"}
[2012-03-28 10:14:21] INFO -- : Guessed service url:
"nmst.api.systemintegration.locaweb.com.br"
[2012-03-28 10:14:21] INFO -- : Validate CAS URI =
https://sso.systemintegration.locaweb.com.br/cas-sys/serviceValidate?service=nmst.api.systemintegration.locaweb.com.br&ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
[2012-03-28 10:14:21] INFO -- : Ticket
"ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for service
"nmst.api.systemintegration.locaweb.com.br" belonging to user "registro"
is VALID.
[2012-03-28 10:14:21] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro]
User-Agent: Ruby
[2012-03-28 10:14:21] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro] Accept:
*/*; q=0.5, application/xml
[2012-03-28 10:14:21] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro] Referer:
[2012-03-28 10:14:21] INFO -- : Rendered
domain_availability/verisign.verisign_domain_availability.builder
(5.3ms)
[2012-03-28 10:14:21] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro] Response:

<?xml version="1.0" encoding="UTF-8"?>
<domain_availability>
  <available>true</available>
</domain_availability>
[2012-03-28 10:14:21] INFO -- : Completed 200 OK in 224ms (Views: 15.4ms
| ActiveRecord: 0.0ms) - Path:
/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
---------------------------------------------------------------------------


Now with the next upstream ENABLED:
---------------------------------------------------------------------------
********************
* Nginx Access Log *
********************
>>> First Call (invalid)
187.45.255.164 - - [28/Mar/2012:10:21:51 -0300] "GET
/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
HTTP/1.1" 502 173 "-" "Ruby"

>>> Second Call (valid)
187.45.255.164 - - [28/Mar/2012:10:21:55 -0300] "GET
/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
HTTP/1.1" 502 173 "-" "Ruby"
>> Nginx does not call the app, it returns 502 directly. Here it seems
to cache something...

>>> Third Call (valid)
187.45.255.164 - - [28/Mar/2012:10:22:00 -0300] "GET
/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
HTTP/1.1" 200 125 "-" "Ruby"
>> Success...



***********
* App Log *
***********
>>> First Call (invalid)
[2012-03-28 10:21:51] INFO -- : Started GET
"/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:21:51 -0300
[2012-03-28 10:21:51] INFO -- : Started GET
"/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:21:51 -0300
[2012-03-28 10:21:51] INFO -- : Started GET
"/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:21:51 -0300
[2012-03-28 10:21:51] INFO -- : Started GET
"/domain_availability/disponivel.mx?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:21:51 -0300
>> Here we have the 4 calls because the next upstream is enabled, only
when the last server is called the error is returned by nginx, so he
keep trying with all servers.


>>> Second Call (valid)
>> Nothing happens in the app log. It gets nothing from nginx...

>>> Third Call (valid)
[2012-03-28 10:22:00] INFO -- : Started GET
"/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas"
for 187.45.255.164 at 2012-03-28 10:22:00 -0300
[2012-03-28 10:22:00] INFO -- : Processing by
DomainAvailabilityController#verisign as XML
[2012-03-28 10:22:00] INFO -- : Parameters:
{"ticket"=>"ST-28035-WPQ4MirFJIKxwagfIIUe-cas", "id"=>"disponivel.com"}
[2012-03-28 10:22:00] INFO -- : Guessed service url:
"nmst.api.systemintegration.locaweb.com.br"
[2012-03-28 10:22:00] INFO -- : Validate CAS URI =
https://sso.systemintegration.locaweb.com.br/cas-sys/serviceValidate?service=nmst.api.systemintegration.locaweb.com.br&ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
[2012-03-28 10:22:00] INFO -- : Ticket
"ST-28035-WPQ4MirFJIKxwagfIIUe-cas" for service
"nmst.api.systemintegration.locaweb.com.br" belonging to user "registro"
is VALID.
[2012-03-28 10:22:00] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro]
User-Agent: Ruby
[2012-03-28 10:22:00] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro] Accept:
*/*; q=0.5, application/xml
[2012-03-28 10:22:00] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro] Referer:
[2012-03-28 10:22:00] INFO -- : Rendered
domain_availability/verisign.verisign_domain_availability.builder
(6.1ms)
[2012-03-28 10:22:00] INFO -- : [127.0.0.1
DomainAvailabilityController#verisign disponivel.com registro] Response:

<?xml version="1.0" encoding="UTF-8"?>
<domain_availability>
  <available>true</available>
</domain_availability>
[2012-03-28 10:22:00] INFO -- : Completed 200 OK in 233ms (Views: 16.9ms
| ActiveRecord: 0.0ms) - Path:
/domain_availability/disponivel.com?ticket=ST-28035-WPQ4MirFJIKxwagfIIUe-cas
---------------------------------------------------------------------------

Thanks in advance!
Daniel Silvestre

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,224548,224548#msg-224548



More information about the nginx mailing list