nginx reload fails with [emerg] host not found in upstream
groknaut
nginx-forum at nginx.us
Fri Dec 7 02:37:16 UTC 2012
hello --
nginx will not reload on some of our proxy servers, but does on others. all
are running the same version: nginx/1.0.15. the reload fails with error:
[emerg] 26903#0: host not found in upstream "webappNNx:8080" in
/etc/nginx/upstream.conf:N
the issue appears to be related to nginx's ability to resolve a hostname.
our proxy servers use BIND servers that we run ourselves. the BIND servers
are returning answers just fine afaict. and when i reproduce this problem on
a proxy server, i sniff the network and can confirm the proxy is asking the
nameserver for an A record, and gets that answer back successfully.
there is a workaround i found, but i would really really rather not resort
to this: putting backend (aka upstream :<) app nodes' into /etc/hosts. i
have also heard suggestions to put the backend nodes' IPs into the proxy
pool file (upstream.conf), but again, i'd rather not because it's not human
readable, especially when firefighting. i'm hoping there is a better
solution out there than these workarounds.
we are using a thirdparty module:
https://github.com/yaoweibin/nginx_upstream_check_module. no i have not
tried to reproduce this problem without the module. i don't know how i would
since we need the functionality that it provides. and yes i will follow up
with the module author.
any help? thank you very much in advance. all the gory details follow.
kallen
straces available upon request :>
a proxy server where the problem does occur:
============================================
i'd like to note that the nginx parent on this server has been running for
about 6 months.
i try to reload, but the reload will not complete due to the error
[emerg] 26903#0: host not found in upstream "webapp04a:8080" in
/etc/nginx/upstream.conf:3
12/07 01:28[root at proxy2-prod-ue1 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
12/07 01:28[root at proxy2-prod-ue1 ~]# ps wwwwaxuf | grep ngin[x]
root 20569 0.0 0.2 25652 5364 ? Ss Jun20 0:03 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 3401 0.4 0.8 37056 15960 ? S Dec05 8:39 \_ nginx:
worker process
nginx 3402 0.4 1.1 40916 19836 ? S Dec05 8:36 \_ nginx:
worker process
12/07 01:29[root at proxy2-prod-ue1 ~]# cat /etc/nginx/upstream.conf
## Tomcat via HTTP
upstream tomcats_http {
server webapp02c:8080 max_fails=2;
server webapp06c:8080 max_fails=2;
server roapp02c:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http
default_down=false;
check_http_send "GET /healthcheck/version HTTP/1.0\r\n\r\n";
}
12/07 01:29[root at proxy2-prod-ue1 ~]# tcpdump -nvv -i eth0 -s0 -X port 53 and
host 10.24.27.66
12/07 01:30[root at proxy2-prod-ue1 ~]# strace -f -s 2048 -ttt -T -p 20569 -o
nginx-parent-strace
Process 20569 attached - interrupt to quit
12/07 01:27[root at proxy2-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f
/var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 00:05:29 [debug] 12290#0: bind() 0.0.0.0:80 #6
2012/12/07 00:05:29 [debug] 12290#0: bind() 0.0.0.0:443 #7
2012/12/07 00:05:29 [debug] 12290#0: counter: B7F38080, 1
2012/12/07 01:28:37 [debug] 22928#0: bind() 0.0.0.0:80 #6
2012/12/07 01:28:37 [debug] 22928#0: bind() 0.0.0.0:443 #7
2012/12/07 01:28:37 [debug] 22928#0: counter: B7F8F080, 1
2012/12/07 01:31:44 [debug] 23383#0: bind() 0.0.0.0:80 #6
2012/12/07 01:31:44 [debug] 23383#0: bind() 0.0.0.0:443 #7
2012/12/07 01:31:44 [debug] 23383#0: counter: B7F56080, 1
2012/12/07 01:31:44 [emerg] 20569#0: host not found in upstream
"webapp02c:8080" in /etc/nginx/upstream.conf:3
as soon as that reload fires, i do see nameservice traffic on the wire. so
it is NOT a matter of DNS service being unavailable. i note that it does ask
for the A record twice. i don't know why.
01:31:44.426376 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF], proto:
UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp cksum
799c!] 18875+ A? webapp02c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52 E..Hx. at .@..N..+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 49bb 0100 ...U.3.5.4..I...
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com.....
01:31:44.427301 IP (tos 0x0, ttl 63, id 42228, offset 0, flags [none],
proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723: [udp
sum ok] 18875* q: A? webapp02c.prod.romeovoid.com. 1/2/2
webapp02c.prod.romeovoid.com. A 10.51.23.17 ns: prod.romeovoid.com. NS
ns1.prod.romeovoid.com., prod.romeovoid.com. NS ns2.prod.romeovoid.com. ar:
ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A
10.24.27.66 (128)
0x0000: 4500 009c a4f4 0000 3f11 a7cc 0af4 ed55 E.......?......U
0x0010: 0af5 2b52 0035 ed33 0088 e8c5 49bb 8580 ..+R.5.3....I...
0x0020: 0001 0001 0002 0002 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 c00c 0001 0001 0000 com.............
0x0050: 003c 0004 0a73 2aab c016 0002 0001 0001 .<...s*.........
0x0060: 5180 0006 036e 7331 c016 c016 0002 0001 Q....ns1........
0x0070: 0001 5180 0006 036e 7332 c016 c048 0001 ..Q....ns2...H..
0x0080: 0001 0000 003c 0004 0ac0 530e c05a 0001 .....<....S..Z..
0x0090: 0001 0000 003c 0004 0af4 ed55 .....<.....U
01:31:44.427420 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF], proto:
UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp cksum
8c21!] 50344+ A? webapp02c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52 E..Hx. at .@..N..+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 c4a8 0100 ...U.3.5.4......
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com.....
01:31:44.428050 IP (tos 0x0, ttl 63, id 42229, offset 0, flags [none],
proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723: [udp
sum ok] 50344* q: A? webapp02c.prod.romeovoid.com. 1/2/2
webapp02c.prod.romeovoid.com. A 10.51.23.17 ns: prod.romeovoid.com. NS
ns2.prod.romeovoid.com., prod.romeovoid.com. NS ns1.prod.romeovoid.com. ar:
ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A
10.24.27.66 (128)
0x0000: 4500 009c a4f5 0000 3f11 a7cb 0af4 ed55 E.......?......U
0x0010: 0af5 2b52 0035 ed33 0088 6dd8 c4a8 8580 ..+R.5.3..m.....
0x0020: 0001 0001 0002 0002 0977 6562 6170 7030 .........webapp0
0x0030: 3263 0470 726f 6407 7361 6173 7572 6503
2c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 c00c 0001 0001 0000 com.............
0x0050: 003c 0004 0a73 2aab c016 0002 0001 0001 .<...s*.........
0x0060: 5180 0006 036e 7332 c016 c016 0002 0001 Q....ns2........
0x0070: 0001 5180 0006 036e 7331 c016 c05a 0001 ..Q....ns1...Z..
0x0080: 0001 0000 003c 0004 0ac0 530e c048 0001 .....<....S..H..
0x0090: 0001 0000 003c 0004 0af4 ed55 .....<.....U
01:31:44.428142 IP (tos 0x0, ttl 64, id 30918, offset 0, flags [DF], proto:
UDP (17), length: 72) 10.45.33.82.60723 > 10.24.27.66.domain: [bad udp cksum
1632!] 45086+ A? webapp06c.prod.romeovoid.com. (44)
0x0000: 4500 0048 78c6 4000 4011 934e 0af5 2b52 E..Hx. at .@..N..+R
0x0010: 0af4 ed55 ed33 0035 0034 2ed6 b01e 0100 ...U.3.5.4......
0x0020: 0001 0000 0000 0000 0977 6562 6170 7030 .........webapp0
0x0030: 3663 0470 726f 6407 7361 6173 7572 6503
6c.prod.romeovoid.
0x0040: 636f 6d00 0001 0001 com.....
01:31:44.428791 IP (tos 0x0, ttl 63, id 42230, offset 0, flags [none],
proto: UDP (17), length: 156) 10.24.27.66.domain > 10.45.33.82.60723: [udp
sum ok] 45086* q: A? webapp06c.prod.romeovoid.com. 1/2/2
webapp06c.prod.romeovoid.com. A 10.195.76.80 ns: prod.romeovoid.com. NS
ns1.prod.romeovoid.com., prod.romeovoid.com. NS ns2.prod.romeovoid.com. ar:
ns1.prod.romeovoid.com. A 10.192.83.14, ns2.prod.romeovoid.com. A
10.24.27.66 (128)
[snip]
the workaround, put all backend nodes (in upstream.conf) into /etc/hosts :<
12/07 01:34[root at proxy2-prod-ue1 ~]# tail -3 /etc/hosts
10.51.23.17 webapp02c.prod.romeovoid.com webapp02c
10.195.76.80 webapp06c.prod.romeovoid.com webapp06c
10.96.23.87 roapp02c.prod.romeovoid.com roapp02c
and now, it will reload just fine:
12/07 01:34[root at proxy2-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f
/var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 01:35:39 [debug] 24076#0: bind() 0.0.0.0:80 #6
2012/12/07 01:35:39 [debug] 24076#0: bind() 0.0.0.0:443 #7
2012/12/07 01:35:39 [debug] 24076#0: counter: B7FCD080, 1
2012/12/07 01:35:39 [debug] 20569#0: http upstream check, find
oshm_zone:092C6390, opeers_shm: B7451000
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit
opeer:10.51.23.17:8080
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit
opeer:10.195.76.80:8080
2012/12/07 01:35:39 [debug] 20569#0: http upstream check: inherit
opeer:10.96.23.87:8080
2012/12/07 01:35:39 [notice] 20569#0: using the "epoll" event method
2012/12/07 01:35:39 [notice] 20569#0: start worker processes
2012/12/07 01:35:39 [debug] 20569#0: channel 3:5
2012/12/07 01:35:39 [notice] 20569#0: start worker process 24078
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:2 pid:24078 fd:3 to s:0
pid:3401 fd:9
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:2 pid:24078 fd:3 to s:1
pid:3402 fd:11
2012/12/07 01:35:39 [debug] 20569#0: channel 14:15
2012/12/07 01:35:39 [notice] 20569#0: start worker process 24079
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to s:0
pid:3401 fd:9
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to s:1
pid:3402 fd:11
2012/12/07 01:35:39 [debug] 20569#0: pass channel s:3 pid:24079 fd:14 to s:2
pid:24078 fd:3
2012/12/07 01:35:39 [debug] 20569#0: child: 0 3401 e:0 t:0 d:0 r:1 j:0
2012/12/07 01:35:39 [debug] 20569#0: child: 1 3402 e:0 t:0 d:0 r:1 j:0
2012/12/07 01:35:39 [debug] 20569#0: child: 2 24078 e:0 t:0 d:0 r:1 j:1
2012/12/07 01:35:39 [debug] 20569#0: child: 3 24079 e:0 t:0 d:0 r:1 j:1
2012/12/07 01:35:39 [debug] 20569#0: sigsuspend
2012/12/07 01:35:39 [debug] 24078#0: malloc: 09340600:6144
2012/12/07 01:35:39 [debug] 24079#0: malloc: 09340600:6144
2012/12/07 01:35:39 [debug] 24078#0: malloc: 0931D3E0:102400
a proxy server where the problem does NOT occur:
================================================
i'd like to note that the nginx parent on this server has been running for
only about 1 month.
12/07 01:04[root at proxy5-prod-ue1 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
12/07 01:40[root at proxy5-prod-ue1 ~]# cat /etc/nginx/upstream.conf
## Tomcat via HTTP
upstream tomcats_http {
server webapp09e:8080 max_fails=2;
server webapp10e:8080 max_fails=2;
server roapp05e:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http
default_down=false;
check_http_send "GET /healthcheck/version HTTP/1.0\r\n\r\n";
}
12/07 01:40[root at proxy5-prod-ue1 ~]# grep webapp /etc/hosts
12/07 01:41[root at proxy5-prod-ue1 ~]# # nothing as expected
12/07 01:42[root at proxy5-prod-ue1 ~]# ps wwwwaxuf | grep ngin[x]
root 4817 0.0 0.3 106184 5528 ? Ss Nov07 0:00 nginx:
master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx 8396 0.6 0.8 116692 15488 ? S 00:36 0:25 \_ nginx:
worker process
nginx 8397 0.6 0.8 116296 15096 ? S 00:36 0:25 \_ nginx:
worker process
12/07 01:42[root at userproxy5-prod-ue1 ~]# /etc/init.d/nginx reload; tail -f
/var/log/nginx/error.log
Reloading nginx: [ OK ]
2012/12/07 01:42:44 [debug] 8396#0: posted event 0000000000000000
2012/12/07 01:42:44 [debug] 8396#0: worker cycle
2012/12/07 01:42:44 [debug] 8396#0: accept mutex locked
2012/12/07 01:42:44 [debug] 8396#0: epoll timer: 399
2012/12/07 01:42:44 [notice] 4817#0: signal 1 (SIGHUP) received,
reconfiguring
2012/12/07 01:42:44 [debug] 4817#0: wake up, sigio 0
2012/12/07 01:42:44 [notice] 4817#0: reconfiguring
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000007F1BA0:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 000000000081FB60:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000008C1980:4096
2012/12/07 01:42:44 [debug] 4817#0: read: 6, 00000000008C1980, 4096, 0
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000006E0A80:6912
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007E59C0:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007A0610:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000731E00:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000774AD0:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000873750:4280
2012/12/07 01:42:44 [debug] 4817#0: malloc: 0000000000781760:4280
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000008D1170:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000007EEA40:4096
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/mime.types
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/mime.types
2012/12/07 01:42:44 [debug] 4817#0: malloc: 000000000080F300:4096
2012/12/07 01:42:44 [debug] 4817#0: read: 8, 000000000080F300, 3463, 0
2012/12/07 01:42:44 [debug] 4817#0: malloc: 00000000006DCA90:4096
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000007642B0:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 00000000008B5F40:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 000000000075B000:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: posix_memalign: 000000000087E390:16384
@16
2012/12/07 01:42:44 [debug] 4817#0: include upstream.conf
2012/12/07 01:42:44 [debug] 4817#0: include /etc/nginx/upstream.conf
our config
=====================
upstream.conf:
## Tomcat via HTTP
upstream tomcats_http {
server webapp02c:8080 max_fails=2;
server webapp06c:8080 max_fails=2;
server roapp02c:8080 backup;
check interval=3000 rise=3 fall=3 timeout=1000 type=http
default_down=false;
check_http_send "GET /healthcheck/version HTTP/1.0\r\n\r\n";
}
nginx.conf:
user nginx;
worker_processes 2;
syslog local2 nginx;
error_log syslog:warn|/var/log/nginx/error.log;
pid /var/run/nginx.pid;
worker_rlimit_core 500M;
working_directory /var/coredumps/;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
proxy_buffers 8 16k;
proxy_buffer_size 32k;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log syslog:warn|/var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
gzip on;
server {
listen 80;
server_name _;
# put X-Purpose: preview into the trash. thank you Safari
if ($http_x_purpose ~* "preview") {
return 444;
break;
}
# http://wiki.nginx.org/HttpStubStatusModule
location /nginx-status {
stub_status on;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
location /upstream-status {
check_status;
access_log off;
allow 10.0.0.0/8;
allow 127.0.0.1;
deny all;
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
set $global_ssl_redirect 'yes';
if ($request_filename ~ "nginx-status") {
set $global_ssl_redirect 'no';
}
if ($request_filename ~ "upstream-status") {
set $global_ssl_redirect 'no';
}
if ($global_ssl_redirect ~* '^yes$') {
rewrite ^ https://$host$request_uri? permanent;
break;
}
}
## Keep upstream defs in a separate file for easier pool membership
control
include upstream.conf;
server {
listen 443;
server_name _;
# put X-Purpose: preview into the trash. thank you Safari
if ($http_x_purpose ~* "preview") {
return 444;
break;
}
ssl on;
ssl_certificate certs/wildcard_void_com.crt;
ssl_certificate_key certs/wildcard_void_com.key;
ssl_protocols SSLv3 TLSv1;
ssl_ciphers HIGH:!ADH:!MD5;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
set_real_ip_from 10.0.0.0/8;
real_ip_header X-Forwarded-For;
add_header Cache-Control public;
## Tomcat via HTTP
location / {
proxy_pass http://tomcats_http;
proxy_connect_timeout 10s;
proxy_next_upstream error invalid_header http_503 http_502 http_504;
proxy_set_header Host $host;
proxy_set_header X-Server-Port $server_port;
proxy_set_header X-Server-Protocol https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Strict-Transport-Security max-age=315360000;
proxy_set_header X-Secure true;
proxy_set_header Transfer-Encoding ""; # OPS-475 remove if/when we
update/punt Tomcat
if ($request_uri ~* "\.(ico|css|js|gif|jpe?g|png)") {
expires 365d;
break;
}
}
error_page 404 /404.html;
location = /404.html {
root /usr/share/nginx/error;
}
error_page 403 /403.html;
location = /403.html {
root /usr/share/nginx/error;
}
error_page 500 502 504 /500.html;
location = /500.html {
root /usr/share/nginx/error;
}
error_page 503 /503.html;
location = /503.html {
root /usr/share/nginx/error;
}
}
}
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,233661,233661#msg-233661
More information about the nginx
mailing list