cache manager process exited with fatal code 2 and cannot be respawned
Andrew Alexeev
andrew at nginx.com
Fri Nov 9 20:06:14 UTC 2012
Hi,
On Nov 9, 2012, at 23:36, Peer Heinlein <p.heinlein at heinlein-support.de> wrote:
> Am 09.11.2012 19:33, schrieb Isaac Hailperin:
>
>
>
> I did several hours of testing today with Isaac and there are two problems.
>
> PROBLEM/BUG ONE:
>
> First of all: The customer has 1.000 SSL-hosts on the nginx-Server, so
> he wants to have 1000 listeners on TCP-Ports. But the cache_manager
> isn't able to open so many listeners. He's crashing after 512 open
> listeners. It looks very much like the cache_manager doesn't read the
> worker_connections setting from nginx.conf.
>
> We configured:
>
> worker_connections 10000;
>
> there, but the cache_manager crashes with
>
> 2012/11/09 17:53:11 [alert] 9345#0: 512 worker_connections are not enough
> 2012/11/09 17:53:12 [alert] 9330#0: cache manager process 9344 exited
> with fatal code 2 and cannot be respawned
>
>
> I did some testing: Having 505 SSL-hosts on the Server (=505 listener
> sockets) everything's working fine, but 515 listener sockets aren't
> possible.
>
> It's easy to reproduce: Just define 515 ssl-domains having different
> TCP-ports for every domain. :-)
>
> Looks like nobody had the idea before, that "somebody" (TM) could run
> more then 2 times /24-network-IPs on one single host. In fact, this does
> not happen in normal life...
>
> But for historical reasons (TM) our customer uses ONE ip-address and
> several TCP-Ports for that so he doesn't have a problem running so many
> differend SSL-hosts on one system -- and this is the special situation
> where we can see the bug (?), that the cache_manager ignores the
> worker_connection-setting (?), when he tries to open all the listeners
> and relating cache-files/sockets.
>
> So: Looks like a bug? Who can help? We need help...
>
>
> PROBLEM/BUG TWO:
>
> Having 16 workers for 1000 ssl-domains with 1000 listeners, we can see
> 16 * 1000 open TCP-listeners on that system, because every worker open
> it's own listeners (?). When we reach the magical barrier of 16386 open
> listeners (lsof -i | grep -c nginx), nginx is running into some kind of
> file limitations:
>
> 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning
> "worker process" (24: Too many open files)
> 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning
> "cache manager process" (24: Too many open files)
> 2012/11/09 20:32:05 [alert] 9933#0: socketpair() failed while spawning
> "cache loader process" (24: Too many open files)
>
> It's very easy to see, that the limitation is based on 16.386 open files
> and sockets from nginx.
>
> But I can't find the place, where this limitation comes from. "ulimit
> -n" is set to 100.000, everything's looking fine and should work with
> many more open files then just 16K.
>
> Could it be, that "nobody" (TM) expected, that "somebody" (TM) runs more
> then 1000 ssl-hosts with different TCP-ports on 16 worker-instances and
> that there's some kind of SMALL-INT-problem in the nginx code? Could it
> be, that this isn't a limitation from the linux system, but from some
> kind of too small address-space for that in nginx?
>
> So: Looks like a bug? Who can help? We need help...
> Peer
>
>
> --
> Heinlein Support GmbH
Are you looking for a commercial support option to back up your customer's contract with an underpinning contract and vendor support?
I that's the case we've got our support options described here:
http://nginx.com/support.html
Hope this helps
> Schwedter Str. 8/9b, 10119 Berlin
>
> http://www.heinlein-support.de
>
> Tel: 030 / 405051-42
> Fax: 030 / 405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein -- Sitz: Berlin
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
More information about the nginx
mailing list