keepalive and workers

Mon Jul 7 17:32:47 MSD 2008

Michał Jaszczyk ha scritto:
> 2008/7/7 Manlio Perillo <manlio_perillo at libero.it>:
>> Michał Jaszczyk ha scritto:
>>> Hi,
>>>
>>> I'm creating a website with Nginx and have some questions. When a user
>>> comes with a request, my application server has to connect to many
>>> other servers in order to create the response. Due to separation of
>>> concerns pattern, application server and all other servers are
>>> separate HTTP servers, each built with Nginx.
>> So you have the main frontend server, with mod_proxy to N backend servers,
>> and then each backend connects to M other servers?
> 
> Nope, I have the following layout:
> - HAProxy LB
> - several app servers Nginx+mod_wsgi
> - several 'backend' servers Nginx+mod_wsgi
> - each app server needs to retrieve information from each backend in
> order to render response
> 

Ok.

> 'Backend' in this context means that it provides some kind of
> information (for example what ads to display where on a page). App
> server in order to render the page needs to contact several 'backends'
> of this kind. The reason for such layout is that my company will
> perhaps buy some software to do the ads stuff. So when I have these
> 'backends', I keep my layout modular.
> 
> And the question is: Can I keep a connection in the python app in the
> app server to the 'backend' server in order to improve performance
> (i.e. omit unnecessary tcp handshakes)? 

In theory, yes.
You can also use Unix (or Local) domain sockets.
They should be often twice as fast as TCP sockets, at least on BSD 
derived implementations.

However there this an important difference: if a call to connect for an 
Unix domain stream socket finds that the listening socket's queue is 
full, ECONNREFUSED is returned immediately.

(from Unix Network Programming, volume 1 third edition).

As far as I know, no TCP handshake is done.

> How will the connections from
> all app servers to a particular 'backend' be spread between its
> workers? 

It depends on OS scheduling.

> I'm concerned that if connections from all app servers are
> handled by one worker in the 'backend', than I have a bottleneck,
> because backend uses mod_wsgi so it can't do multiple request
> simultaneously.
> 

Again, it depends on OS scheduling.
There is *no* load balancing among Nginx workers.
In theory, you can have 4 worker processes, but only one is actually 
scheduled by the operating system.

The solution is to use an asynchronous connection to the backend servers.

See this example of simple HTTP proxy using curl and my experimental 
asynchronous extensions for WSGI:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-curl.py

There is also this alternate example:
http://hg.mperillo.ath.cx/nginx/mod_wsgi/file/tip/examples/nginx-poll-sleep.py

where instead of polling for curl file descriptors, it just suspends the 
execution of the current request for 500 ms.
This in theory is less efficient, but it is more robust with the current 
version!

*NOTE*: the asynchronous extensions are not stable. I'm planning to
         remove them, and instead integrate greenlet inside mod_wsgi.
         http://codespeak.net/py/dist/greenlet.html

Another solution is to retrieve the informations you need using 
JavaScript and XMLHTTPRequest in asynchronous mode.

> Hope this explains my situation.
> 
> Thanks for all the help and prompt response!
> 

Manlio Perillo