Thank you very much Dounin. With the patch it works fine now.<div><br></div><div><br></div><div><br><br><div class="gmail_quote">On Mon, Sep 5, 2011 at 12:34 AM, Maxim Dounin <span dir="ltr"><<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hello!<br>
<div class="im"><br>
On Sat, Sep 03, 2011 at 11:06:24PM +0800, dr-dr xp wrote:<br>
<br>
> The attachment is the error.log with only the failed request. And stat<br>
> output of this file:<br>
><br>
> stat<br>
> //data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235<br>
> File:<br>
> `//data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235'<br>
> Size: 51163499 Blocks: 100048 IO Block: 4096 regular file<br>
> Device: 821h/2081d Inode: 464404482 Links: 1<br>
> Access: (0664/-rw-rw-r--) Uid: ( 1003/ www) Gid: ( 1003/ www)<br>
> Access: 2011-09-03 22:31:22.000000000 +0800<br>
> Modify: 2011-07-30 01:45:56.000000000 +0800<br>
> Change: 2011-07-30 01:45:56.000000000 +0800<br>
><br>
><br>
> Such error occurs randomly on random file. The only relavent thing seems<br>
> like to be open_file_cache directive.<br>
> File sizes are almost 5M to 20M.<br>
> Files had not been changed. I checked the modify time of the files involved<br>
> in these errors. At first I had thought it was caused by a file change too.<br>
> But not.<br>
><br>
> In my cluster there are also some other similar nodes running with the same<br>
> open_file_cache parameters but without aio enabled(sendfile is used). They<br>
> have no such issue.<br>
<br>
</div>Please try the attached patch.<br>
<font color="#888888"><br>
Maxim Dounin<br>
</font><div><div></div><div class="h5"><br>
><br>
><br>
><br>
><br>
> On Sat, Sep 3, 2011 at 6:03 PM, Maxim Dounin <<a href="mailto:mdounin@mdounin.ru">mdounin@mdounin.ru</a>> wrote:<br>
><br>
> > Hello!<br>
> ><br>
> > On Sat, Sep 03, 2011 at 05:46:39PM +0800, dr-dr xp wrote:<br>
> ><br>
> > > Hello everybody. I recently have had some problem with AIO support and<br>
> > need<br>
> > > some help or advice on it.<br>
> > ><br>
> > > With AIO on, everything was OK. But when I enabled open_file_cache, nginx<br>
> > > frequently closed client connection directly without sending back any<br>
> > data.<br>
> > ><br>
> > ><br>
> > > My platform is linux 2.6.30. CentOS 5.4;<br>
> > ><br>
> > > The nginx server that has this issue serves static file at backend.<br>
> > Another<br>
> > > nginx serves as proxy in front of it.<br>
> > > Backend nginx config is like this:<br>
> > > ...<br>
> > > aio on;<br>
> > ><br>
> > ><br>
> > > directio 512;<br>
> > ><br>
> > ><br>
> > > output_buffers 1 128k;<br>
> > > ...<br>
> > > location /files {<br>
> > ><br>
> > ><br>
> > ><br>
> > > internal;<br>
> > ><br>
> > ><br>
> > > alias /;<br>
> > > open_file_cache max=50000 inactive=20s;<br>
> > ><br>
> > ><br>
> > > open_file_cache_valid 30s;<br>
> > ><br>
> > ><br>
> > > open_file_cache_min_uses 1;<br>
> > ><br>
> > ><br>
> > > open_file_cache_errors on;<br>
> > ><br>
> > ><br>
> > > With this config, on the frontend proxy server there are error logs like<br>
> > > this:<br>
> > ><br>
> > > 2011/09/03 17:01:19 [error] 12498#0: *78679759 *upstream prematurely<br>
> > closed<br>
> > > connection while reading response header from upstream*, client:<br>
> > > 221.193.212.77, server: , request: "GET .......... HTTP/1.1", upstream:<br>
> > > "http://<br>
> > **/file/871586e763bf4417bf8d475858deee2b/260110/0b47fc7f4d84e4bcb01a6d6d1fcacc94-3581749",<br>
> > > host.....<br>
> > ><br>
> > ><br>
> > > And I saw in tcpdump output the connection was closed very quickly by<br>
> > > backend nginx:<br>
> > ><br>
> > > 22:02:22.892327 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: S<br>
> > > 1788188296:1788188296(0) win 5840 <mss 1460,sackOK,timestamp 666482608<br>
> > > 0,nop,wscale 7><br>
> > > 22:02:22.895683 IP 172.16.228.63.irdmi > 172.16.113.32.58271: S<br>
> > > 350810413:350810413(0) ack 1788188297 win 5792 <mss 1460,sackOK,timestamp<br>
> > > 1948226601 66<<br>
> > > 22:02:22.895695 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: . ack 1 win<br>
> > 46<br>
> > > <nop,nop,timestamp 666482612 1948226601><br>
> > > 22:02:22.895703 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: P<br>
> > 1:181(180)<br>
> > > ack 1 win 46 <nop,nop,timestamp 666482612 1948226601><br>
> > > 22:02:22.897680 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 181<br>
> > win<br>
> > > 54 <nop,nop,timestamp 1948226603 666482612><br>
> > > 22:02:22.898348 IP 172.16.228.63.irdmi > 172.16.113.32.58271: F 1:1(0)<br>
> > ack<br>
> > > 181 win 54 <nop,nop,timestamp 1948226603 666482612><br>
> > > 22:02:22.898473 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: F<br>
> > 181:181(0)<br>
> > > ack 2 win 46 <nop,nop,timestamp 666482615 1948226603><br>
> > > 22:02:22.899804 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 182<br>
> > win<br>
> > > 54 <nop,nop,timestamp 1948226605 666482615><br>
> > ><br>
> > ><br>
> > > On the backend nginx, strace shows that io_getevents got an error *"-22"<br>
> > > EINVAL*.<br>
> > ><br>
> > > [pid 5275] recvfrom(48, 0x7fffe4400ae7, 1, 2, 0, 0) = -1 EAGAIN<br>
> > (Resource<br>
> > > temporarily unavailable)<br>
> > > [pid 5275] epoll_wait(6, {{EPOLLIN|EPOLLOUT, {u32=2445415233,<br>
> > > u64=139850875540289}}}, 512, 5672) = 1<br>
> > > [pid 5275] recvfrom(201, "\1\6\0\1\0\337\1\0Status: 301 Moved<br>
> > Perman"...,<br>
> > > 4096, 0, NULL, NULL) = 256<br>
> > > [pid 5275] close(201) = 0<br>
> > > [pid 5275]<br>
> > ><br>
> > stat("//data1/871586e763bf4417bf8d475858deee2b/g268315/252/995f20f76d0d3871fdb5d71bbc92956c-27769564",<br>
> > > {st_mode=S_IFREG|0600, st_size=122507, ...<<br>
> > > [pid 5275] io_submit(139850942226432, 1, {{0xf43718, 0, 0, 0, 40}}) =<br>
> > 1<br>
> > ><br>
> > ><br>
> > > [pid 5275] epoll_wait(6, {{EPOLLIN, {u32=7173248, u64=7173248}}}, 512,<br>
> > > 5671) = 1<br>
> > ><br>
> > > [pid 5275] read(7, "\1\0\0\0\0\0\0\0", 8) = 8<br>
> > > [pid 5275] *io_getevents(139850942226432, 1, 64, {{0xf43718, 0xf436d8,<br>
> > > -22, 0}}, {0, 0}) = 1*<br>
> > > [pid 5275] write(24, "172.16.164.30 - - <a href="http://172.16.228.63" target="_blank">172.16.228.63</a>:"..., 338) =<br>
> > 338<br>
> > > [pid 5275] close(48)<br>
> > ><br>
> > > And access.log shows that response length is ZERO:<br>
> > ><br>
> > > [03/Sep/2011:06:28:48 +0800] "GET<br>
> > ><br>
> > /file/871586e763bf4417bf8d475858deee2b/260284/ed31933859d27e33b8398deaea1d2ade-3969549<br>
> > > HTTP/1.0" 206 0 0.001 "-" "-" "-" Range:bytes=27459105-<br>
> > ><br>
> > ><br>
> > > After I removed those *open_file_cache* directives, No such error<br>
> > occurred.<br>
> > ><br>
> > > And the shorter *inactive* is, the more often I get such errors.<br>
> > ><br>
> > ><br>
> > > Anybody has idea about avoiding this or fixing it?<br>
> ><br>
> > Could you please provide debug log? See<br>
> > <a href="http://wiki.nginx.org/Debugging" target="_blank">http://wiki.nginx.org/Debugging</a> for more information.<br>
> ><br>
> > Additionally, some more info about file requested might be<br>
> > helpful: how big it is? whether it changed on backend during<br>
> > requests? if yes, is it changed atomically?<br>
> ><br>
> > Maxim Dounin<br>
> ><br>
> > _______________________________________________<br>
> > nginx mailing list<br>
> > <a href="mailto:nginx@nginx.org">nginx@nginx.org</a><br>
> > <a href="http://mailman.nginx.org/mailman/listinfo/nginx" target="_blank">http://mailman.nginx.org/mailman/listinfo/nginx</a><br>
> ><br>
><br>
><br>
><br>
> --<br>
> 要了几天饱饭就不记得西北风啥味了<br>
<br>
<br>
> _______________________________________________<br>
> nginx mailing list<br>
> <a href="mailto:nginx@nginx.org">nginx@nginx.org</a><br>
> <a href="http://mailman.nginx.org/mailman/listinfo/nginx" target="_blank">http://mailman.nginx.org/mailman/listinfo/nginx</a><br>
<br>
</div></div><br>_______________________________________________<br>
nginx mailing list<br>
<a href="mailto:nginx@nginx.org">nginx@nginx.org</a><br>
<a href="http://mailman.nginx.org/mailman/listinfo/nginx" target="_blank">http://mailman.nginx.org/mailman/listinfo/nginx</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br>要了几天饱饭就不记得西北风啥味了<br>
</div>