nginx 1.0.2 AIO problem with open_file_cache

dr-dr xp drdr.xp at gmail.com
Sat Sep 3 15:06:24 UTC 2011


The attachment is the error.log with only the failed request. And stat
output of this file:

stat
//data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235
  File:
`//data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235'
  Size: 51163499        Blocks: 100048     IO Block: 4096   regular file
Device: 821h/2081d      Inode: 464404482   Links: 1
Access: (0664/-rw-rw-r--)  Uid: (    1003/    www)   Gid: (    1003/    www)
Access: 2011-09-03 22:31:22.000000000 +0800
Modify: 2011-07-30 01:45:56.000000000 +0800
Change: 2011-07-30 01:45:56.000000000 +0800


Such error occurs randomly on random file. The only relavent thing seems
like to be open_file_cache directive.
File sizes are almost 5M to 20M.
Files had not been changed. I checked the modify time of the files involved
in these errors. At first I had thought it was caused by a file change too.
But not.

In my cluster there are also some other similar nodes running with the same
open_file_cache parameters but without aio enabled(sendfile is used). They
have no such issue.




On Sat, Sep 3, 2011 at 6:03 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!
>
> On Sat, Sep 03, 2011 at 05:46:39PM +0800, dr-dr xp wrote:
>
> > Hello everybody. I recently have had some problem with AIO support and
> need
> > some help or advice on it.
> >
> > With AIO on, everything was OK. But when I enabled open_file_cache, nginx
> > frequently closed client connection directly without sending back any
> data.
> >
> >
> > My platform is linux 2.6.30. CentOS 5.4;
> >
> > The nginx server that has this issue serves static file at backend.
> Another
> > nginx serves as proxy in front of it.
> > Backend nginx config is like this:
> > ...
> > aio            on;
> >
> >
> > directio       512;
> >
> >
> > output_buffers 1 128k;
> > ...
> >     location /files {
> >
> >
> >
> >         internal;
> >
> >
> >         alias /;
> >         open_file_cache max=50000 inactive=20s;
> >
> >
> >         open_file_cache_valid    30s;
> >
> >
> >         open_file_cache_min_uses 1;
> >
> >
> >         open_file_cache_errors   on;
> >
> >
> > With this config, on the frontend proxy server there are error logs like
> > this:
> >
> > 2011/09/03 17:01:19 [error] 12498#0: *78679759 *upstream prematurely
> closed
> > connection while reading response header from upstream*, client:
> > 221.193.212.77, server: , request: "GET .......... HTTP/1.1", upstream:
> > "http://
> **/file/871586e763bf4417bf8d475858deee2b/260110/0b47fc7f4d84e4bcb01a6d6d1fcacc94-3581749",
> > host.....
> >
> >
> > And I saw in tcpdump output the connection was closed very quickly by
> > backend nginx:
> >
> > 22:02:22.892327 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: S
> > 1788188296:1788188296(0) win 5840 <mss 1460,sackOK,timestamp 666482608
> > 0,nop,wscale 7>
> > 22:02:22.895683 IP 172.16.228.63.irdmi > 172.16.113.32.58271: S
> > 350810413:350810413(0) ack 1788188297 win 5792 <mss 1460,sackOK,timestamp
> > 1948226601 66<
> > 22:02:22.895695 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: . ack 1 win
> 46
> > <nop,nop,timestamp 666482612 1948226601>
> > 22:02:22.895703 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: P
> 1:181(180)
> > ack 1 win 46 <nop,nop,timestamp 666482612 1948226601>
> > 22:02:22.897680 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 181
> win
> > 54 <nop,nop,timestamp 1948226603 666482612>
> > 22:02:22.898348 IP 172.16.228.63.irdmi > 172.16.113.32.58271: F 1:1(0)
> ack
> > 181 win 54 <nop,nop,timestamp 1948226603 666482612>
> > 22:02:22.898473 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: F
> 181:181(0)
> > ack 2 win 46 <nop,nop,timestamp 666482615 1948226603>
> > 22:02:22.899804 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 182
> win
> > 54 <nop,nop,timestamp 1948226605 666482615>
> >
> >
> > On the backend nginx, strace shows that io_getevents got an error *"-22"
> > EINVAL*.
> >
> >   [pid  5275] recvfrom(48, 0x7fffe4400ae7, 1, 2, 0, 0) = -1 EAGAIN
> (Resource
> > temporarily unavailable)
> >   [pid  5275] epoll_wait(6, {{EPOLLIN|EPOLLOUT, {u32=2445415233,
> > u64=139850875540289}}}, 512, 5672) = 1
> >   [pid  5275] recvfrom(201, "\1\6\0\1\0\337\1\0Status: 301 Moved
> Perman"...,
> > 4096, 0, NULL, NULL) = 256
> >   [pid  5275] close(201)                  = 0
> >   [pid  5275]
> >
> stat("//data1/871586e763bf4417bf8d475858deee2b/g268315/252/995f20f76d0d3871fdb5d71bbc92956c-27769564",
> > {st_mode=S_IFREG|0600, st_size=122507, ...<
> >   [pid  5275] io_submit(139850942226432, 1, {{0xf43718, 0, 0, 0, 40}}) =
> 1
> >
> >
> >   [pid  5275] epoll_wait(6, {{EPOLLIN, {u32=7173248, u64=7173248}}}, 512,
> > 5671) = 1
> >
> >   [pid  5275] read(7, "\1\0\0\0\0\0\0\0", 8) = 8
> >   [pid  5275] *io_getevents(139850942226432, 1, 64, {{0xf43718, 0xf436d8,
> > -22, 0}}, {0, 0}) = 1*
> >   [pid  5275] write(24, "172.16.164.30 - - 172.16.228.63:"..., 338) =
> 338
> >   [pid  5275] close(48)
> >
> > And access.log shows that response length is ZERO:
> >
> > [03/Sep/2011:06:28:48 +0800] "GET
> >
> /file/871586e763bf4417bf8d475858deee2b/260284/ed31933859d27e33b8398deaea1d2ade-3969549
> > HTTP/1.0" 206 0 0.001 "-" "-" "-" Range:bytes=27459105-
> >
> >
> > After I removed those *open_file_cache* directives, No such error
> occurred.
> >
> > And the shorter *inactive* is,  the more often I get such errors.
> >
> >
> > Anybody has idea about avoiding this or fixing it?
>
> Could you please provide debug log?  See
> http://wiki.nginx.org/Debugging for more information.
>
> Additionally, some more info about file requested might be
> helpful: how big it is? whether it changed on backend during
> requests? if yes, is it changed atomically?
>
> Maxim Dounin
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>



-- 
要了几天饱饭就不记得西北风啥味了
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20110903/4b9cc5c0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: error.log
Type: application/octet-stream
Size: 18946 bytes
Desc: not available
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20110903/4b9cc5c0/attachment-0001.obj>


More information about the nginx mailing list