And this is the access log entry:<div>172.16.113.32 - - <a href="http://172.16.228.63:8000">172.16.228.63:8000</a> [03/Sep/2011:22:43:44 +0800] "GET /file/0d97e01d7fbc48e498897492f0863929/260674/26fd3732cd9f92f2413d10206cd8b0c9-5157235 HTTP/1.0" 206 0 0.002 "-" "-" "-" Range:bytes=1059980- usTime:0.001 us:unix:/tmp/storage_fcgi_sock RID:0000052a-1109-0322-4344-842b2b0c0467 ERR:- ReqLength:171 ReqBody:- COOKIE:"-"</div>
<div><br></div><div>0000052a is connection id in hex.</div><div><br><br><div class="gmail_quote">2011/9/3 dr-dr xp <span dir="ltr"><<a href="mailto:drdr.xp@gmail.com">drdr.xp@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
The attachment is the error.log with only the failed request. And stat output of this file:<div><br><div><div>stat //data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235</div><div> File: `//data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235'</div>
<div> Size: 51163499 Blocks: 100048 IO Block: 4096 regular file</div><div>Device: 821h/2081d Inode: 464404482 Links: 1</div><div>Access: (0664/-rw-rw-r--) Uid: ( 1003/ www) Gid: ( 1003/ www)</div>
<div>Access: 2011-09-03 22:31:22.000000000 +0800</div><div>Modify: 2011-07-30 01:45:56.000000000 +0800</div><div>Change: 2011-07-30 01:45:56.000000000 +0800</div><div><br></div><div><br></div><div>Such error occurs randomly on random file. The only relavent thing seems like to be open_file_cache directive.</div>
<div>File sizes are almost 5M to 20M.</div><div>Files had not been changed. I checked the modify time of the files involved in these errors. At first I had thought it was caused by a file change too. But not.</div><div><br>
</div><div>In my cluster there are also some other similar nodes running with the same open_file_cache parameters but without aio enabled(sendfile is used). They have no such issue.</div><div><br></div><div><br></div><div>
<br></div><div><div><div></div><div class="h5"><br><div class="gmail_quote">On Sat, Sep 3, 2011 at 6:03 PM, Maxim Dounin <span dir="ltr"><<a href="mailto:mdounin@mdounin.ru" target="_blank">mdounin@mdounin.ru</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello!<br>
<div><div></div><div><br>
On Sat, Sep 03, 2011 at 05:46:39PM +0800, dr-dr xp wrote:<br>
<br>
> Hello everybody. I recently have had some problem with AIO support and need<br>
> some help or advice on it.<br>
><br>
> With AIO on, everything was OK. But when I enabled open_file_cache, nginx<br>
> frequently closed client connection directly without sending back any data.<br>
><br>
><br>
> My platform is linux 2.6.30. CentOS 5.4;<br>
><br>
> The nginx server that has this issue serves static file at backend. Another<br>
> nginx serves as proxy in front of it.<br>
> Backend nginx config is like this:<br>
> ...<br>
> aio on;<br>
><br>
><br>
> directio 512;<br>
><br>
><br>
> output_buffers 1 128k;<br>
> ...<br>
> location /files {<br>
><br>
><br>
><br>
> internal;<br>
><br>
><br>
> alias /;<br>
> open_file_cache max=50000 inactive=20s;<br>
><br>
><br>
> open_file_cache_valid 30s;<br>
><br>
><br>
> open_file_cache_min_uses 1;<br>
><br>
><br>
> open_file_cache_errors on;<br>
><br>
><br>
> With this config, on the frontend proxy server there are error logs like<br>
> this:<br>
><br>
> 2011/09/03 17:01:19 [error] 12498#0: *78679759 *upstream prematurely closed<br>
> connection while reading response header from upstream*, client:<br>
> 221.193.212.77, server: , request: "GET .......... HTTP/1.1", upstream:<br>
> "http://**/file/871586e763bf4417bf8d475858deee2b/260110/0b47fc7f4d84e4bcb01a6d6d1fcacc94-3581749",<br>
> host.....<br>
><br>
><br>
> And I saw in tcpdump output the connection was closed very quickly by<br>
> backend nginx:<br>
><br>
> 22:02:22.892327 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: S<br>
> 1788188296:1788188296(0) win 5840 <mss 1460,sackOK,timestamp 666482608<br>
> 0,nop,wscale 7><br>
> 22:02:22.895683 IP 172.16.228.63.irdmi > 172.16.113.32.58271: S<br>
> 350810413:350810413(0) ack 1788188297 win 5792 <mss 1460,sackOK,timestamp<br>
> 1948226601 66<<br>
> 22:02:22.895695 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: . ack 1 win 46<br>
> <nop,nop,timestamp 666482612 1948226601><br>
> 22:02:22.895703 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: P 1:181(180)<br>
> ack 1 win 46 <nop,nop,timestamp 666482612 1948226601><br>
> 22:02:22.897680 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 181 win<br>
> 54 <nop,nop,timestamp 1948226603 666482612><br>
> 22:02:22.898348 IP 172.16.228.63.irdmi > 172.16.113.32.58271: F 1:1(0) ack<br>
> 181 win 54 <nop,nop,timestamp 1948226603 666482612><br>
> 22:02:22.898473 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: F 181:181(0)<br>
> ack 2 win 46 <nop,nop,timestamp 666482615 1948226603><br>
> 22:02:22.899804 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 182 win<br>
> 54 <nop,nop,timestamp 1948226605 666482615><br>
><br>
><br>
> On the backend nginx, strace shows that io_getevents got an error *"-22"<br>
> EINVAL*.<br>
><br>
> [pid 5275] recvfrom(48, 0x7fffe4400ae7, 1, 2, 0, 0) = -1 EAGAIN (Resource<br>
> temporarily unavailable)<br>
> [pid 5275] epoll_wait(6, {{EPOLLIN|EPOLLOUT, {u32=2445415233,<br>
> u64=139850875540289}}}, 512, 5672) = 1<br>
> [pid 5275] recvfrom(201, "\1\6\0\1\0\337\1\0Status: 301 Moved Perman"...,<br>
> 4096, 0, NULL, NULL) = 256<br>
> [pid 5275] close(201) = 0<br>
> [pid 5275]<br>
> stat("//data1/871586e763bf4417bf8d475858deee2b/g268315/252/995f20f76d0d3871fdb5d71bbc92956c-27769564",<br>
> {st_mode=S_IFREG|0600, st_size=122507, ...<<br>
> [pid 5275] io_submit(139850942226432, 1, {{0xf43718, 0, 0, 0, 40}}) = 1<br>
><br>
><br>
> [pid 5275] epoll_wait(6, {{EPOLLIN, {u32=7173248, u64=7173248}}}, 512,<br>
> 5671) = 1<br>
><br>
> [pid 5275] read(7, "\1\0\0\0\0\0\0\0", 8) = 8<br>
> [pid 5275] *io_getevents(139850942226432, 1, 64, {{0xf43718, 0xf436d8,<br>
> -22, 0}}, {0, 0}) = 1*<br>
> [pid 5275] write(24, "172.16.164.30 - - <a href="http://172.16.228.63" target="_blank">172.16.228.63</a>:"..., 338) = 338<br>
> [pid 5275] close(48)<br>
><br>
> And access.log shows that response length is ZERO:<br>
><br>
> [03/Sep/2011:06:28:48 +0800] "GET<br>
> /file/871586e763bf4417bf8d475858deee2b/260284/ed31933859d27e33b8398deaea1d2ade-3969549<br>
> HTTP/1.0" 206 0 0.001 "-" "-" "-" Range:bytes=27459105-<br>
><br>
><br>
> After I removed those *open_file_cache* directives, No such error occurred.<br>
><br>
> And the shorter *inactive* is, the more often I get such errors.<br>
><br>
><br>
> Anybody has idea about avoiding this or fixing it?<br>
<br>
</div></div>Could you please provide debug log? See<br>
<a href="http://wiki.nginx.org/Debugging" target="_blank">http://wiki.nginx.org/Debugging</a> for more information.<br>
<br>
Additionally, some more info about file requested might be<br>
helpful: how big it is? whether it changed on backend during<br>
requests? if yes, is it changed atomically?<br>
<br>
Maxim Dounin<br>
<br>
_______________________________________________<br>
nginx mailing list<br>
<a href="mailto:nginx@nginx.org" target="_blank">nginx@nginx.org</a><br>
<a href="http://mailman.nginx.org/mailman/listinfo/nginx" target="_blank">http://mailman.nginx.org/mailman/listinfo/nginx</a><br>
</blockquote></div><br><br clear="all"><div><br></div></div></div><font color="#888888">-- <br>要了几天饱饭就不记得西北风啥味了<br>
</font></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>要了几天饱饭就不记得西北风啥味了<br>
</div>