[PATCH] Mail: add the "reuseport" option of the "listen" directive

Maxim Konovalov maxim at nginx.com
Wed Aug 18 09:16:10 UTC 2021


On 18.08.2021 04:14, Robert Mueller wrote:
> 
>> First, thanks for the patch.
>>
>> While the reuseport could cure (or hide if you will) the unbalancing you
>> see it makes sense to get better understanding what exactly is going on.
>>  So far we haven't seen such weird behaviour ourself neither received
>> reports about such uneven connections distribution among nginx workers.
>>
>> Any chances you have accept_mutex and/or multi_accept?  Any other ideas?
> 
> Unfortunately I'm not 100% sure what's causing it, but it's pretty easy for us to reproduce even on our development machines. Just to show there's no accept_mutex or multi_accept in our config.
> 
> ```
> # grep accept /etc/nginx/mail.conf
> # 
> ```
> 
> And here's what a cut down version of our config looks like.
> 
> ```
> worker_processes  auto;
> worker_shutdown_timeout 5m;
> 
> events {
>     use epoll;
>     worker_connections  65536;
> }
> ...
> mail {
>     auth_http http://unix:/var/run/nginx/mail_auth.sock:/nginx/;
>     imap_client_buffer  16k;
>     imap_capabilities "IMAP4" "IMAP4rev1" "LITERAL+" "ENABLE" "UIDPLUS" "SASL-IR" "NAMESPACE" "CONDSTORE" "SORT" "LIST-EXTENDED" "QRESYNC" "MOVE" "SPECIAL-USE" "CREATE-SPECIAL-USE" "IDLE";
>     ssl_session_cache shared:sslcache:50m;
>     ssl_session_timeout 30m;
> 
>     server {
>       listen 10.a.b.c:993 ssl reuseport;
>       auth_http_header "ServerHostname" "imap.foo";
>       ssl_prefer_server_ciphers on;
>       ssl_protocols ...
>       ssl_ciphers ...;
>       ssl_certificate      ...;
>       ssl_certificate_key  ...;
>       protocol imap;
>       proxy on;
>       proxy_timeout  1h;
>     }
> ```
> 
> With that on a development machine which has 4 vcpus we see:
> 
> ```
> # ps auxw | grep nginx | grep mail
> root        3839  0.0  0.0  68472  1372 ?        Ss   08:16   0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3841  0.0  0.0  95732  3572 ?        S    08:16   0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3842  0.0  0.0  95732  3284 ?        S    08:16   0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3843  0.0  0.0  95796  4096 ?        S    08:16   0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody      3846  0.0  0.0  95732  3092 ?        S    08:16   0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> ```
> 
> Now lets just create 1000 SSL connections and see how they get distributed between those procs.
> 
> ```
> # perl -e 'use IO::Socket::SSL; for (1..1000) { push @s, IO::Socket::SSL->new("imap.foo:993"); } print "done\n"; sleep 1000;'
> done
> ^Z
> [3]+  Stopped
> # for i in 3841 3842 3843 3846; do echo "$i - " `ls /proc/$i/fd | wc -l`; done
> 3841 -  335
> 3842 -  295
> 3843 -  293
> 3846 -  320
> ```
> 
> Reasonably even.
> 
> Now lets change `listen 10.a.b.c:993 ssl reuseport` to `listen 10.a.b.c:993 ssl` and restart.
> 
> ```
> # ps auxw | grep nginx | grep mail
> root      559885  0.0  0.0  68472  3104 ?        Ss   21:01   0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559886  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559887  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559888  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody    559889  0.0  0.3  95620 30448 ?        S    21:01   0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> # perl -e 'use IO::Socket::SSL; for (1..1000) { push @s, IO::Socket::SSL->new("imap.foo:993"); } print "done\n"; sleep 1000;'
> done
> ^Z
> [5]+  Stopped
> # for i in 559886 559887 559888 559889; do echo "$i - " `ls /proc/$i/fd | wc -l`; done
> 559886 -  1054
> 559887 -  57
> 559888 -  60
> 559889 -  57
> ```
> 
> And as you can see, a completely uneven distribution of connections between processes! This doesn't just occur on our development machines either (e.g. it's not related to the source IP or anything), it occurs on production systems with connections arriving from real world customers and clients scattered around the world.
> 
> This is a fairly standard debian buster distribution, though we use a back ported newer kernel, and a recent version of nginx.
> 
> ```
> # uname -a
> Linux xyz 5.10.0-0.bpo.4-amd64 #1 SMP Debian 5.10.19-1~bpo10+1 (2021-03-13) x86_64 GNU/Linux
> # /usr/local/nginx/sbin/nginx -v
> nginx version: nginx/1.20.1
> ```
> 
> As you can see, without the reuseport option, this causes severe scalability problems for us.
> 
> Even without that though, it would just be nice to have some more consistency of the `listen` options between http/stream/mail modules as well.
> 
This looks weird.

We'll try to reproduce this in our lab.  Thanks for the detailed script.

Maxim

-- 
Maxim Konovalov


More information about the nginx-devel mailing list