Nginx & FastCGI buffering to slow clients, thousands of connections backup to fast-cgi/php processes

Sun Dec 20 16:49:14 MSK 2009

I assume this has been covered before but despite lots of searching, cannot find it.

We run many nginx servers with fast-cgi for PHP, but here in China the connections can be slow or dynamic, so suddenly we have 1000 connections in 'write' status in Nginx.  This should be no problem, but these seem to be backing up the fast-cgi processes, eventually running out and the whole system locks up from a users' perspective, with 502 errors since nginx can't find any more fast-cgi processes to talk with (they are all busy).  

Errors we get are usually "upstream timed out (110: Connection timed out) while reading response header from upstream, client: 61.149.175.16, server: 121.13.44.145, request: "GET /...

In this scenario, we'd need 1000+ fast-cgi processes to handle all the open connections.  We'd prefer to run 10-20 php connections, which can easily handle all the performance needs.

We thought, this is simple, just add more buffering so all the PHP output is in memory in Nginx, and it will close the connection to PHP and another user can use it.  These are big servers, with 8-16 cores and 24GB+ RAM so we have plenty of power and memory.  So we added bigger buffers to 64KB and added thousands of buffers, etc. but Nginx's memory size didn't really increase (very small at 30-50MB) and the problem didn't go away.  And we have no buffering to disk messages in the logs.

So if the buffers are big enough and/or we have disks space, I am thinking nginx will ALWAYS buffer ALL the fastcgi data and the connection will close, so we should NEVER see fastcgi waiting for nginx to write data to a client - is this correct?

First, are these buffer settings per connection or for all connections ?  I assume fastcgi_buffer_size is per connection.  But if fastcgi_buffers is per connection, why have a buffer count, why not just say 32K, 64K, etc.?  So I'm guessing this is the total buffers available to the server, in blocks of the buffer size, for example fastcgi_buffers 1024 64k gives me 64MB of total buffer space.  If I have 100 connections, I can buffer about 600KB each, etc. before nginx starts buffering to files.

Without a fix we are running 1000 cgi processes and 1-2K nginx connections.  This works but if we ever get real load, we'll have a 250 load average, like we used to see on loaded Apache systems.  We need a way to use 10-20 PHP processes on a few thousand slow connections.

I assume we have buffering problems, but maybe there is a close or other issue that prevents the php from being re-used, but this works great when the connections are fast and the % of writers is small.

The fastcgi engine is php5-cgi; maybe we should use spawn-cgi from lighttpd.

Key configis are:

events {
    worker_connections  4096;
    use epoll;
    multi_accept off;
}

http {
    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  15;
    tcp_nodelay        on;

server {
        listen 80;
        server_name 120.136.43.145 abc.com.cn 127.0.0.1;

        root /var/www/abc;

        access_log /var/log/nginx/abc.com_access.log;
        error_log /var/log/nginx/abc.com_error.log;

        index  index.html index.php index.htm;

        location ~ \.php$ {
                fastcgi_pass   127.0.0.1:9000;
                fastcgi_index  index.php;
                fastcgi_buffer_size 64k;
                fastcgi_buffers 4096 64k;
                fastcgi_param  SCRIPT_FILENAME  /var/www/abc$fastcgi_script_name;
                include fastcgi_params;
        }

}

Posted at Nginx Forum: http://forum.nginx.org/read.php?2,32120,32120#msg-32120