nginx -s reload fails / workers in process-state H

John Feuerstein john at feurix.com
Wed Jun 9 17:46:41 MSD 2010


Hello,

using nginx-0.8.40 on Linux 2.6.32 I've encountered a problem with
nginx reloading, which leaves [nginx] processes in process state "H"
behind. As soon as the first of these processes is left behind, nginx
is unable to reload the workers and is unable to stop gracefully.
(happens with 0.8.36 and 0.8.37 too)

Sending signals to the H-state processes won't do anything, leaving no
other choice but to SIGKILL them.

This behavior is very seldom and I've tried to track it down while
having nginx logging with log level debug. It finally happened after
reloading nginx the 68th time.

The system was completely idle during the tests.

Note that I've first encountered this while using cron to reload the
nginx configuration every *5 minutes*. I'm not entirely sure if this is
triggered by some kind of race-condition, where the last nginx reload
was not completely finished and the next one is issued on top.
I doubt it though: The debug log I've created shows the full process
list after a reload. AFAIK nginx changes some proctitles while in
reload-state, which I could not find anymore just seconds after issuing
the reload signal. In addition, the system was idle (no requests...), so
the old workers should fade in no time.

The command used by cron to reload nginx is:

	nginx -t && nginx -s reload


Steps to reproduce it (after stopping Cron and nginx):

# Step 1: Start up nginx as usual, eg. /etc/init.d/nginx start
# Step 2: Run the following script to issue the reload signal
#         every 20 seconds:


------------------------------- snip ---------------------------------
#!/bin/bash

log=/var/log/nginx/error_log

echo "XXXXXXXXXX ------- SHOWING INITIAL NGINX PROCESSES -------
XXXXXXXXXX" >> "$log"
ps auxf 2>&1 | grep nginx >> "$log"
echo "XXXXXXXXXX ------- END OF INITIAL NGINX PROCESSES -------
XXXXXXXXXX" >> "$log"

i=1
while sleep 10
do
    echo "XXXXXXXXXX ------- RELOAD $i -------    XXXXXXXXXX" >> "$log"

    nginx -t && nginx -s reload
    sleep 10

    echo "XXXXXXXXXX ------- RELOAD $i complete - SHOWING NGINX
PROCESSES ------- XXXXXXXXXX" >> "$log"
    ps auxf 2>&1 | grep nginx >> "$log"
    echo "XXXXXXXXXX ------- END OF NGINX PROCESSES ------- XXXXXXXXXX" 
>> "$log"

   ((i++))
done
------------------------------- snap --------------------------------

As far as I understand, the reload should be complete when the process
list is appended to the error_log 10 seconds after reloading nginx
*and* it does not contain any proctitles like "nginx: reloading
workers..".

# Step 3: watch -n1 ps auxf     ... and wait until a nginx process in
#                                   state "H" appears.

# Step 4: CTRL-C the debug script, stop nginx and SIGKILL all "H"-state
processes.

# Step 5: Examine the error_log.

This debug log is about 21M in size, gzipped down to 1M. I've uploaded
it here to not pollute the mailing list:

	http://biz.baze.de/files/nginx-debug.log.gz

The first appearance of an "H"-state process is after reload 68 on line
number 300630. From there on, the nginx reload is broken and more of
these processes accumulate.

To find the next marker, just search for "XXXXXXXXXX". Search for
" H " (leading and trailing space) to find the next process list with
such a process.

For the record, here is the process list after reload 68, with broken
process (pid 11488).

XXXXXXXXXX ------- RELOAD 68 complete - SHOWING NGINX PROCESSES -------
XXXXXXXXXX
root     30864  0.0  0.0  12780  1344 pts/9    S+   13:27   0:00
\_ /bin/bash ./nginx-debug.sh
root     11894  0.0  0.0   9800   860 pts/9    S+   13:50
0:00          \_ grep nginx
root     11895  0.0  0.0  12780   496 pts/9    S+   13:50
0:00          \_ /bin/bash ./nginx-debug.sh
root     30776  0.1  0.0 101668 11332 ?        Ss   13:27   0:02 nginx:
master process /usr/sbin/nginx -c /etc/nginx/n…
nginx    11488  0.0  0.0      0     0 ?        H    13:49   0:00  \_
[nginx]
nginx    11835  0.0  0.0 101668 11876 ?        S    13:50   0:00  \_
nginx: worker process
nginx    11836  0.0  0.0 101668 12096 ?        S    13:50   0:00  \_
nginx: worker process
nginx    11837  0.0  0.0 101668 11008 ?        S    13:50   0:00  \_
nginx: cache manager process
nginx    11838  0.0  0.0 101668 10988 ?        S    13:50   0:00  \_
nginx: cache loader process
XXXXXXXXXX ------- END OF NGINX PROCESSES ------- XXXXXXXXXX



Here is the configuration:

------------------------------- snip --------------------------------
user nginx nginx;
daemon  on;

worker_processes 2;

events {
    worker_connections 5000;
    use epoll;
}

# [ debug | info | notice | warn | error | crit ]
error_log /var/log/nginx/error_log debug;

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    # Server config
    server_tokens           on;
    ignore_invalid_headers  on;

    sendfile         on;
    tcp_nopush       on;
    tcp_nodelay      on;

    index index.html index.htm;

    # Autoindex
    autoindex off;
    autoindex_exact_size off;
    autoindex_localtime on;

    # GZIP configuration
    gzip            on;
    gzip_types      text/plain application/x-javascript text/xml
    text/css; gzip_disable    "MSIE [1-6]\.(?!.*SV1)";

    # Rate Limiting
    limit_rate_after 20;
    limit_rate 2m;

    # File Cache
    open_file_cache max=2000 inactive=60s;
    open_file_cache_valid    60s;
    open_file_cache_errors   on;

    # Proxy Setup
    proxy_redirect              off;
    server_name_in_redirect     off;
    proxy_max_temp_file_size    16m;
    proxy_intercept_errors      off;
    client_max_body_size        10m;
    client_body_buffer_size     128k;

    # Proxy Cache
    proxy_buffering         on;
    proxy_buffers           32 4k;

    proxy_cache_path        /var/cache/nginx
                            levels=2:2:2
                            keys_zone=cache:64m
                            inactive=600
                            max_size=4096m;

    proxy_cache_key         "$host$request_uri";
    proxy_cache_valid       200 301 302 404  1m;

    # Proxy Headers
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;

    # TIMEOUTS
    client_header_timeout   30;
    client_body_timeout     30;
    send_timeout            30;

    proxy_connect_timeout   10;
    proxy_send_timeout      60;
    proxy_read_timeout      60;

    keepalive_timeout       0;

    # DEFAULT SERVER
    server {
        listen 80 default;
        server_name _;

        index index.html;
        root  /var/www;

        open_file_cache off;

    }

    include /etc/nginx/upstreams-php.d/*;
    include /etc/nginx/upstreams-wsgi.d/*;
    include /etc/nginx/vhosts.d/*;
}
------------------------------- snap --------------------------------

The included files are upstream{} and server{} blocks with valid
syntax. They do not enable/configure any additional module.


$ nginx -V
nginx version: nginx/0.8.40
configure arguments: --prefix=/usr --sbin-path=/usr/sbin/nginx
--conf-path=/etc/nginx/nginx.conf
--error-log-path=/var/log/nginx/error_log --pid-path=/var/run/nginx.pid
--lock-path=/var/lock/nginx.lock --user=nginx --group=nginx
--with-cc-opt=-I/usr/include --with-ld-opt=-L/usr/lib
--http-log-path=/var/log/nginx/access_log
--http-client-body-temp-path=/var/tmp/nginx/client
--http-proxy-temp-path=/var/tmp/nginx/proxy
--http-fastcgi-temp-path=/var/tmp/nginx/fastcgi
--with-http_realip_module --without-http_charset_module
--without-http_ssi_module --without-http_userid_module
--without-http_geo_module --without-http_map_module
--without-http_split_clients_module --without-http_referer_module
--without-http_fastcgi_module --without-http_memcached_module
--without-http_empty_gif_module --without-http_browser_module
--without-http_uwsgi_module --with-debug



Let me know if you need anything else.

Best regards,
John



More information about the nginx mailing list