cache manager process exited with fatal code 2 and cannot be respawned

Andrew Alexeev andrew at nginx.com
Fri Nov 9 20:06:14 UTC 2012


Hi,

On Nov 9, 2012, at 23:36, Peer Heinlein <p.heinlein at heinlein-support.de> wrote:

> Am 09.11.2012 19:33, schrieb Isaac Hailperin:
> 
> 
> 
> I did several hours of testing today with Isaac and there are two problems.
> 
> PROBLEM/BUG ONE:
> 
> First of all: The customer has 1.000 SSL-hosts on the nginx-Server, so
> he wants to have 1000 listeners on TCP-Ports. But the cache_manager
> isn't able to open so many listeners. He's crashing after 512 open
> listeners. It looks very much like the cache_manager doesn't read the
> worker_connections setting from nginx.conf.
> 
> We configured:
> 
>    worker_connections 10000;
> 
> there, but the cache_manager crashes with
> 
> 2012/11/09 17:53:11 [alert] 9345#0: 512 worker_connections are not enough
> 2012/11/09 17:53:12 [alert] 9330#0: cache manager process 9344 exited
> with fatal code 2 and cannot be respawned
> 
> 
> I did some testing: Having 505 SSL-hosts on the Server (=505 listener
> sockets) everything's working fine, but 515 listener sockets aren't
> possible.
> 
> It's easy to reproduce: Just define 515 ssl-domains having different
> TCP-ports for every domain. :-)
> 
> Looks like nobody had the idea before, that "somebody" (TM) could run
> more then 2 times /24-network-IPs on one single host. In fact, this does
> not happen in normal life...
> 
> But for historical reasons (TM) our customer uses ONE ip-address and
> several TCP-Ports for that so he doesn't have a problem running so many
> differend SSL-hosts on one system -- and this is the special situation
> where we can see the bug (?), that the cache_manager ignores the
> worker_connection-setting (?), when he tries to open all the listeners
> and relating cache-files/sockets.
> 
> So: Looks like a bug? Who can help? We need help...
> 
> 
> PROBLEM/BUG TWO:
> 
> Having 16 workers for 1000 ssl-domains with 1000 listeners, we can see
> 16 * 1000 open TCP-listeners on that system, because every worker open
> it's own listeners (?). When we reach the magical barrier of 16386 open
> listeners (lsof -i | grep -c nginx), nginx is running into some kind of
> file limitations:
> 
> 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning
> "worker process" (24: Too many open files)
> 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning
> "cache manager process" (24: Too many open files)
> 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning
> "cache loader process" (24: Too many open files)
> 
> It's very easy to see, that the limitation is based on 16.386 open files
> and sockets from nginx.
> 
> But I can't find the place, where this limitation comes from. "ulimit
> -n" is set to 100.000, everything's looking fine and should work with
> many more open files then just 16K.
> 
> Could it be, that "nobody" (TM) expected, that "somebody" (TM) runs more
> then 1000 ssl-hosts with different TCP-ports on 16 worker-instances and
> that there's some kind of SMALL-INT-problem in the nginx code? Could it
> be, that this isn't a limitation from the linux system, but from some
> kind of too small address-space for that in nginx?
> 
> So: Looks like a bug? Who can help? We need help...
> Peer
> 
> 
> -- 
> Heinlein Support GmbH

Are you looking for a commercial support option to back up your customer's contract with an underpinning contract and vendor support?

I that's the case we've got our support options described here:

http://nginx.com/support.html

Hope this helps


> Schwedter Str. 8/9b, 10119 Berlin
> 
> http://www.heinlein-support.de
> 
> Tel: 030 / 405051-42
> Fax: 030 / 405051-19
> 
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx



More information about the nginx mailing list