fphp-fastcgi and nginx problem

Ian Hobson ian at ianhobson.co.uk
Tue Dec 1 16:00:00 MSK 2009

Hi all,

I need some help. A system I'm supporting is collapsing in a heap 
whenever it is put under even moderate load!

The setup -  Ubuntu, nginx 0.7.62, php-fastcgi.

The application is a chatroom, and is built as follows.

Main screen is multi-panelled. One panel shows data entry area. When 
"send" button clicked, the input is sent to the server. Server stores 
the data in a MySQL database and returns an empty 200 reply. The input 
panel is cleared and ready for the next message.

While this is going on, another panel is using "get" to fetch all the 
messages since No. X. The server recieves this, checks the database once 
per second for 4 seconds for message X or later. When it finds something 
to send to the user (or after 4 seconds) it replies. When the page 
returns, the on-load event updates the other panels, including updating 
a clock to give confidence the process is working.

Moments after loading the data for message X, the page autoloads, 
requesting the messages since X=1; This heartbeat goes on in the 
background all the time. The result is a chatroom that "just works" 
through proxies without any software intallation, even in tied-down 
business sites. If the user can browse the internet, has cookies and 
javascript enabled, the system will work.

When we tested it with 4 users - the most we could achieve - things 
worked great.

But when there are a lot of users (the report is of 7) the responce 
becomes very slow. The logs contain gaps of 20 seconds between replies 
and reply codes of 499.  There have always been reports of dropped 
messages, and people getting logged out, but I could never identify why.

PHP_FCGI_CHILDREN is set at 3.

What I think is happening is that, when 4 users are logged on, the 
server is handling 3 of them, and the 4th is waiting in Nginx.
As each message is only in the machine for 4 seconds, things circulate 
nicely. When someone posts a message, it may take a moment to queue for 
the server, but when it gets handled it will trigger all the other users 
to recieve replies quickly, so the queue will clear. The queue will then 
form again immediatly.

When there are rather more users - say 8 - at any moment 3 are going 
through the server in parallel taking 4 seconds each. The server is 
handling 1 every 1.4 seconds. 5 are queueng in nginx, in addition to any 
messages. Queue time is therefore at least  (5*1.4) + 4 = 11 seconds. 
More if anyone posts a message.

It is possible for the queue of messages to contain both a heatbeat and 
a message update from a given user.

Question - What will nginx do inthis situation? If it discards one or 
the other with a 499 then we will have dropped messages or the heatbeat 
will stop. If so, then this is why we have reports of dropped messages 
and random breakdowns.

Second. What is the solution? Raising PHP_FCGI_CHILDREN to the number 
needed is clearly not going to work - I don't have enough RAM!

My thinking.

1) Alter the server code so that it looks once, and replies. It will 
reply with many more "null" returns, but it will handle each request in 
a fraction of a second. The queue will disappear.

2) Alter the client code, so that it delays longer - say 3 seconds - 
after getting a heatbeat update before requesting the next.

3) Have the send shorten this delay - perhaps the reply will trigger the 
next heatbeat request - so that your posts come back quickly.

4) If the heatbeat is in progress when a send is requested, delay the 
send until the heatbeat's reply is recieved.

Question - will this work? Why? Why not?

Question - is point 4 necessary?

Input and ideas gratefully recieved.



More information about the nginx mailing list