[PATCH] Mail: add the "reuseport" option of the "listen" directive
Maxim Konovalov
maxim at nginx.com
Wed Aug 18 09:16:10 UTC 2021
On 18.08.2021 04:14, Robert Mueller wrote:
>
>> First, thanks for the patch.
>>
>> While the reuseport could cure (or hide if you will) the unbalancing you
>> see it makes sense to get better understanding what exactly is going on.
>> So far we haven't seen such weird behaviour ourself neither received
>> reports about such uneven connections distribution among nginx workers.
>>
>> Any chances you have accept_mutex and/or multi_accept? Any other ideas?
>
> Unfortunately I'm not 100% sure what's causing it, but it's pretty easy for us to reproduce even on our development machines. Just to show there's no accept_mutex or multi_accept in our config.
>
> ```
> # grep accept /etc/nginx/mail.conf
> #
> ```
>
> And here's what a cut down version of our config looks like.
>
> ```
> worker_processes auto;
> worker_shutdown_timeout 5m;
>
> events {
> use epoll;
> worker_connections 65536;
> }
> ...
> mail {
> auth_http http://unix:/var/run/nginx/mail_auth.sock:/nginx/;
> imap_client_buffer 16k;
> imap_capabilities "IMAP4" "IMAP4rev1" "LITERAL+" "ENABLE" "UIDPLUS" "SASL-IR" "NAMESPACE" "CONDSTORE" "SORT" "LIST-EXTENDED" "QRESYNC" "MOVE" "SPECIAL-USE" "CREATE-SPECIAL-USE" "IDLE";
> ssl_session_cache shared:sslcache:50m;
> ssl_session_timeout 30m;
>
> server {
> listen 10.a.b.c:993 ssl reuseport;
> auth_http_header "ServerHostname" "imap.foo";
> ssl_prefer_server_ciphers on;
> ssl_protocols ...
> ssl_ciphers ...;
> ssl_certificate ...;
> ssl_certificate_key ...;
> protocol imap;
> proxy on;
> proxy_timeout 1h;
> }
> ```
>
> With that on a development machine which has 4 vcpus we see:
>
> ```
> # ps auxw | grep nginx | grep mail
> root 3839 0.0 0.0 68472 1372 ? Ss 08:16 0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 3841 0.0 0.0 95732 3572 ? S 08:16 0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 3842 0.0 0.0 95732 3284 ? S 08:16 0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 3843 0.0 0.0 95796 4096 ? S 08:16 0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 3846 0.0 0.0 95732 3092 ? S 08:16 0:01 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> ```
>
> Now lets just create 1000 SSL connections and see how they get distributed between those procs.
>
> ```
> # perl -e 'use IO::Socket::SSL; for (1..1000) { push @s, IO::Socket::SSL->new("imap.foo:993"); } print "done\n"; sleep 1000;'
> done
> ^Z
> [3]+ Stopped
> # for i in 3841 3842 3843 3846; do echo "$i - " `ls /proc/$i/fd | wc -l`; done
> 3841 - 335
> 3842 - 295
> 3843 - 293
> 3846 - 320
> ```
>
> Reasonably even.
>
> Now lets change `listen 10.a.b.c:993 ssl reuseport` to `listen 10.a.b.c:993 ssl` and restart.
>
> ```
> # ps auxw | grep nginx | grep mail
> root 559885 0.0 0.0 68472 3104 ? Ss 21:01 0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 559886 0.0 0.3 95620 30448 ? S 21:01 0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 559887 0.0 0.3 95620 30448 ? S 21:01 0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 559888 0.0 0.3 95620 30448 ? S 21:01 0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> nobody 559889 0.0 0.3 95620 30448 ? S 21:01 0:00 nginx: worker process /usr/local/nginx/sbin/nginx -c /etc/nginx/mail.conf
> # perl -e 'use IO::Socket::SSL; for (1..1000) { push @s, IO::Socket::SSL->new("imap.foo:993"); } print "done\n"; sleep 1000;'
> done
> ^Z
> [5]+ Stopped
> # for i in 559886 559887 559888 559889; do echo "$i - " `ls /proc/$i/fd | wc -l`; done
> 559886 - 1054
> 559887 - 57
> 559888 - 60
> 559889 - 57
> ```
>
> And as you can see, a completely uneven distribution of connections between processes! This doesn't just occur on our development machines either (e.g. it's not related to the source IP or anything), it occurs on production systems with connections arriving from real world customers and clients scattered around the world.
>
> This is a fairly standard debian buster distribution, though we use a back ported newer kernel, and a recent version of nginx.
>
> ```
> # uname -a
> Linux xyz 5.10.0-0.bpo.4-amd64 #1 SMP Debian 5.10.19-1~bpo10+1 (2021-03-13) x86_64 GNU/Linux
> # /usr/local/nginx/sbin/nginx -v
> nginx version: nginx/1.20.1
> ```
>
> As you can see, without the reuseport option, this causes severe scalability problems for us.
>
> Even without that though, it would just be nice to have some more consistency of the `listen` options between http/stream/mail modules as well.
>
This looks weird.
We'll try to reproduce this in our lab. Thanks for the detailed script.
Maxim
--
Maxim Konovalov
More information about the nginx-devel
mailing list