On Tuesday 24 November 2015 15:10:12 Bart Warmerdam wrote:
On a system with a load of about 500-600 URI/sec I see some unexpected behaviour when using "aio threads" option in the configuration.
System setup: The system runs on RHEL6.6 with 3 workers running nginx 1.9.6 with thread support. Content is cached and populated by a proxied-upstream. The cache location is a tmpfs file system with more then enough space at all times. Proxy buffer size 8k. The output buffer is default (no config item, so 2 32k). Keepalive timeout 75s. Sendfile is enabled.
Seen behaviour: On the WAF in front of this system I see occasional hangs on resources (mainly larger files like js, jpeg, ..). Seen in the WAF log is that this WAF waits for the transfer to be completed until nginx closes the connection at the keepalive time of 75s. In the nginx access.log I see the entry served from cache (upstream server '-') with the correct content length. In the tcp dump I see the response of this call to contain a content-length header with the correct length, a server time header over 1 minute older then the tcpdump timestamp (all servers are ntp-connected). The served jpeg is half-way in its cache lifetime at that time and there are previous served entries from cache without incomplete transfers. In the tcp dump the jpeg file starts to differ from the original after 32168 bytes and misses 8192 bytes after which the remaining content is served (which is identical to original). From the tcpdump I can extract the file which is missing 8192 bytes.
We have also a dump when during the proxied call this same behaviour was seen. The upstream call is started to get a jpeg from the origin. After a few packets the data is sent to the WAF. The complete upstream file is retrieved (can be validated in the tcpdump that the jpeg is complete and correctly retrieved), but not all the data is sent to the listening socket to the WAF.
If I change the setup to "aio on" or "aio off" this behaviour is not seen. This is the only change in the configuration between the tests. It looks like this behaviour only affects bigger files. I have not seen this effect on small files or proxied responses.
Does anyone have the same experience with this option. And what is the best way to proceed in tracing this?
Could you provide the debug log? http://nginx.org/en/docs/debugging_log.html
wbr, Valentin V. Bartenev