error: deadlock happened in ngx_http_file_cache_expire()

zhenwei nginx-forum at nginx.us
Fri Nov 11 08:56:09 UTC 2011


{{{
proxy_cache_path  /nginx/cache/three levels=1:1:2 
keys_zone=three:1000m;
proxy_cache_valid  200 304 302  60m;
}}}

Nginx version: 1.0.1
nginx: configure arguments: --with-pcre --group=admin --user=admin
--prefix=/var/www/nginx --with-http_ssl_module --with-http_perl_module
--with-http_stub_status_module
--with-perl_modules_path=./src/http/modules/perl/
--add-module=../ngx_cache_purge

The cache entries timeout is set as 60 minutes, on production
environment(for website hosting) service became unavailable after
running several hours. After dive into backtraces and source code, it's
a deadlock issue on reload nginx(reload occurs every several minutes),
here is the bt information.

{{{
root     28779  0.0  0.1 1124788 6464 ?        Ss   11:15   0:00 nginx:
master process /var/www/nginx//sbin/nginx
nobody   28781 48.8  0.2 1126424 8268 ?        R    11:15  77:41 nginx:
worker process is shutting down
nobody   28784 47.2  0.0 1122576 3460 ?        R    11:15  75:05 nginx:
cache manager process
nobody   28785 47.6  0.0 1122724 3496 ?        R    11:15  75:45 nginx:
cache loader process
nobody   30156 50.8  0.2 1128800 9648 ?        R    11:43  66:27 nginx:
worker process     
nobody   30157 54.1  0.2 1128800 9648 ?        R    11:43  70:46 nginx:
worker process     
nobody   30158 54.9  0.2 1128800 9572 ?        R    11:43  71:52 nginx:
worker process     
nobody   30159 55.3  0.2 1128800 9648 ?        R    11:43  72:22 nginx:
worker process     
nobody   30160  8.5 26.3 1124788 1069964 ?     R    11:43  11:12 nginx:
cache manager process
}}}

>From the ps list Nginx is under reloading, when all worker process and a
cache manager process is spinlock to get the cache, and another cache
manager acquired the lock and fall into a while loop.

{{{
#0  ngx_http_file_cache_expire (cache=0x1ea7dc00) at
src/http/ngx_http_file_cache.c:1103
1103            if (fcn->deleting) {
(gdb) bt
#0  ngx_http_file_cache_expire (cache=0x1ea7dc00) at
src/http/ngx_http_file_cache.c:1103
#1  0x0000000000474e8b in ngx_http_file_cache_manager (data=0x1ea7dc00)
at src/http/ngx_http_file_cache.c:1193
#2  0x0000000000439e9d in ngx_cache_manager_process_handler
(ev=0x7fff5f5f3120) at src/os/unix/ngx_process_cycle.c:1346
#3  0x00000000004303f0 in ngx_event_expire_timers () at
src/event/ngx_event_timer.c:149
#4  0x000000000042e537 in ngx_process_events_and_timers
(cycle=0x1e9b5a70) at src/event/ngx_event.c:261
#5  0x0000000000439d6e in ngx_cache_manager_process_cycle
(cycle=0x1e9b5a70, data=0x6c4440) at
src/os/unix/ngx_process_cycle.c:1328
#6  0x00000000004361bd in ngx_spawn_process (cycle=0x1e9b5a70,
proc=0x439bdc <ngx_cache_manager_process_cycle>, data=0x6c4440,
name=0x4abc88 "cache manager process", 
    respawn=-4) at src/os/unix/ngx_process.c:196
#7  0x00000000004382ca in ngx_start_cache_manager_processes
(cycle=0x1e9b5a70, respawn=1) at src/os/unix/ngx_process_cycle.c:398
#8  0x0000000000437dc9 in ngx_master_process_cycle (cycle=0x1e9b5a70) at
src/os/unix/ngx_process_cycle.c:251
#9  0x000000000040e36c in main (argc=1, argv=0x7fff5f5f36d8) at
src/core/nginx.c:405
(gdb) p fcn->deleting
$1 = 1
(gdb) p wait
$2 = -67
}}}

Source code
{{{
1075     for ( ;; ) {
1076 
1077         if (ngx_queue_empty(&cache->sh->queue)) {
1078             wait = 10;
1079             break;
1080         }
1081 
1082         q = ngx_queue_last(&cache->sh->queue);
1083 
1084         fcn = ngx_queue_data(q, ngx_http_file_cache_node_t,
queue);
1085 
1086         wait = fcn->expire - now;
1087 
1088         if (wait > 0) {
1089             wait = wait > 10 ? 10 : wait;
1090             break;
1091         }
.....
}}}

The for loop never exit if "wait" is a negative number and unfortunately
it becomes -67, I hope decrease the timeout to 10m would work around it,
and it failed, either.

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,218202,218202#msg-218202



More information about the nginx mailing list