Kernel stall while testing high-speed HTTPS traffic.

Ben Greear greearb at candelatech.com
Thu May 28 22:24:23 UTC 2015


Some additional info was requested:

[root at e5-1630-v3-qc lanforge]# openssl engine -tt
(rdrand) Intel RDRAND engine
     [ available ]
(dynamic) Dynamic engine loading support
     [ unavailable ]
[root at e5-1630-v3-qc lanforge]# openssl version
OpenSSL 1.0.1e-fips 11 Feb 2013
[root at e5-1630-v3-qc lanforge]# openssl speed -multi ^C

# NOTE:  My CPU supports AES-NI instructions...do I need to do anything
# special to enable that with nginx, or should it be working by default?

[root at e5-1630-v3-qc lanforge]# openssl speed -multi 4 rsa2048 ecdsap256
Forked child 0
Forked child 1
Forked child 2
Forked child 3
+DTP:2048:private:rsa:10
+DTP:2048:private:rsa:10
+DTP:2048:private:rsa:10
+DTP:2048:private:rsa:10
+R1:10253:2048:10.00
+DTP:2048:public:rsa:10
+R1:10345:2048:10.00
+DTP:2048:public:rsa:10
+R1:5385:2048:10.00
+DTP:2048:public:rsa:10
+R1:5387:2048:10.00
+DTP:2048:public:rsa:10
+R2:334855:2048:10.00
+R2:336207:2048:10.00
+DTP:256:sign:ecdsa:10
+DTP:256:sign:ecdsa:10
+R2:185283:2048:10.00
+R2:185265:2048:10.00
+DTP:256:sign:ecdsa:10
+DTP:256:sign:ecdsa:10
+R5:115623:256:10.00
+R5:116966:256:10.00
+DTP:256:verify:ecdsa:10
+DTP:256:verify:ecdsa:10
+R5:64033:256:10.00
+R5:64223:256:10.00
+DTP:256:verify:ecdsa:10
+DTP:256:verify:ecdsa:10
+R6:29783:256:10.00
+R6:30572:256:10.00
Got: +F2:2:2048:0.000967:0.000030 from 0
Got: +F4:3:256:0.000085:0.000327 from 0
+R6:15179:256:10.00
+R6:15196:256:10.00
Got: +F2:2:2048:0.001857:0.000054 from 1
Got: +F4:3:256:0.000156:0.000658 from 1
Got: +F2:2:2048:0.000975:0.000030 from 2
Got: +F4:3:256:0.000086:0.000336 from 2
Got: +F2:2:2048:0.001856:0.000054 from 3
Got: +F4:3:256:0.000156:0.000659 from 3
OpenSSL 1.0.1e-fips 11 Feb 2013
built on: Thu Oct 16 11:09:39 UTC 2014
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wa,--noexecstack -DPURIFY
-DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM
-DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
                  sign    verify    sign/s verify/s
rsa 2048 bits 0.000319s 0.000010s   3137.1 103703.7
                              sign    verify    sign/s verify/s
 256 bit ecdsa (nistp256)   0.0000s   0.0001s  36213.1   9071.5

# NOTE on the below ldd info:  the /home/lanforge/libssl.so.10 and libcrypto.so.10 are
# just copies of the same files from /usr/lib64/

[root at e5-1630-v3-qc lanforge]# ldd /usr/local/lanforge/nginx/sbin/nginx
	linux-vdso.so.1 =>  (0x00007fff5d7fe000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a9d800000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x0000003aae000000)
	libssl.so.10 => /home/lanforge/libssl.so.10 (0x00000033fe000000)
	libcrypto.so.10 => /home/lanforge/libcrypto.so.10 (0x00000033f8000000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003a9d400000)
	libz.so.1 => /lib64/libz.so.1 (0x0000003a9dc00000)
	libc.so.6 => /lib64/libc.so.6 (0x0000003a9d000000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003a9c800000)
	libfreebl3.so => /lib64/libfreebl3.so (0x0000003aac800000)
	libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x0000003aae400000)
	libkrb5.so.3 => /lib64/libkrb5.so.3 (0x0000003ab2400000)
	libcom_err.so.2 => /lib64/libcom_err.so.2 (0x0000003aadc00000)
	libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x0000003aae800000)
	libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x0000003ab1800000)
	libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x0000003aaf400000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003a9f400000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003a9e800000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x0000003a9e400000)

[root at e5-1630-v3-qc lanforge]# lspci|grep -F Eth
02:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller LX710 for 40GbE QSFP+ (rev 01)
03:00.1 Ethernet controller: Intel Corporation Ethernet Controller LX710 for 40GbE QSFP+ (rev 01)
07:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

I am using in-kernel drivers, and I am quite sure it is not a NIC issue since this
same system can sustain 10.8Gbps of HTTPS traffic served by Apache, and the 40G NICs can
sustain 20+Gbps of UDP traffic.  So, I skiped the NIC stats that were requested.  If they
really seem to be needed, I can gather that info.

Thanks,
Ben


On 05/28/2015 12:26 PM, Ben Greear wrote:
> We are seeing problems with Nginx (mostly)locking up the server when
> running high loads of HTTPS traffic.
> 
> This scenario we had nginx configured to
> bind to eth3 but our ssh sessions on eth0 were frozen during this condition as well.
> The system restores itself after a few minutes, (the load generation would
> have stopped after a minute or two of lockup, that may be what lets things
> recover).
> 
> We tested different kernels (4.0.4+, 4.0.0+, 3.17.8+ with local patches,
> and stock 3.14.27-100.fc19.x86_64, all with same results), different NICs (Intel 10G, Intel 40G),
> and Apache as web server.
> 
> Apache can sustain about 10.8Gbps of HTTPS traffic and shows no
> instability/lockups.  nginx maxes out at 2.2Gbps (until it locks up machine).
> 
> Some kernel splats indicated some files writing to the file system
> journal were blocked > 180 seconds, but they recover, so it is not
> a hard lock.  The system should not be doing any heavy disk access
> since we have 32GB RAM.  Swap shows no usage.
> 
> === Scenario ===
> Load testing box has a direct connection to eth3->eth3 over 10Gbps port.
> 
> Curl clients using https, keepalive, requesting a 1MB file:
> 1000 clients @ 0.25 req/sec = 243 req/sec, 2.2Gbps tx, load 8.3
>  400 clients @ 0.65 req/sec = 260 req/sec, 2.2Gbps tx, load 9.2
> 
> 
> 
> === Environment ===
> processor	: 7
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 63
> model name	: Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz
> 
>> free
>              total       used       free     shared    buffers     cached
> Mem:      32840296    1394884   31445412          0     132792     632068
> -/+ buffers/cache:     630024   32210272
> Swap:     16457724          0   16457724
> 
>> cat /etc/issue
> Fedora release 19 (Schrödinger’s Cat)
> Kernel \r on an \m (\l)
> 
> # uname -a
> Linux e5-1630-v3-qc 3.14.27-100.fc19.x86_64 #1 SMP Wed Dec 17 19:36:34 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> 
> # /usr/local/lanforge/nginx/sbin/nginx -v
> nginx version: nginx/1.9.1
> 
> We are running small patch to allow nginx to bind to a particular interface.  We
> tried with this option disabled, and that causes the same trouble.  The exact
> source is found below:
> 
> https://github.com/greearb/nginx/commits/master
> 
> We are compiling nginx with these options:
> 
> ./configure --prefix=/usr/local/lanforge/nginx/ --with-http_ssl_module --with-ipv6 --without-http_rewrite_module
> 
> === Nginx Config ===
> 
> worker_processes	      auto;
> worker_rlimit_nofile    100000;
> error_log               logs/eth3_error.log;
> pid		               /home/lanforge/vr_conf/nginx_eth3.pid;
> events {
>    use                  epoll;
>    worker_connections   8096;
>    multi_accept         on;
> }
> http {
>    include             /usr/local/lanforge/nginx/conf/mime.types;
>    default_type        application/octet-stream;
>    access_log          off;
>    sendfile            on;
>    directio            1m;
>    disable_symlinks    on;
>    gzip                off;
>    tcp_nopush          on;
>    tcp_nodelay         on;
> 
>    open_file_cache max=1000 inactive=10s;
>    open_file_cache_valid    600s;
>    open_file_cache_min_uses 2000;
>    open_file_cache_errors   off;
>    etag                off;
> 
>    server {
>        listen          1.1.1.1:80 so_keepalive=on bind_dev=eth3;
>        server_name     nginx.local nginx web.local web;
> 
>        location / {
>            root   /var/www/html;
>            index  index.html index.htm;
>           }
>        error_page   500 502 503 504  /50x.html;
>        location = /50x.html {
>            root   html;
>        }
>    }
>    server {
>        listen                1.1.1.1:443 so_keepalive=on ssl bind_dev=eth3;
>        server_name           nginx.local nginx web.local web;
>        ssl_certificate       /usr/local/lanforge/apache.crt;
>        ssl_certificate_key   /usr/local/lanforge/apache.key;
>        location / {
>            root   /var/www/html;
>            index  index.html index.htm;
>        }
>        error_page   500 502 503 504  /50x.html;
>        location = /50x.html {
>            root   html;
>        }
>    }
> }
> 
> 
> Any help or suggestions is appreciated.
> 
> Thanks,
> Ben
> 


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the nginx-devel mailing list