nginx 1.0.2 AIO problem with open_file_cache
Maxim Dounin
mdounin at mdounin.ru
Sun Sep 4 16:34:13 UTC 2011
Hello!
On Sat, Sep 03, 2011 at 11:06:24PM +0800, dr-dr xp wrote:
> The attachment is the error.log with only the failed request. And stat
> output of this file:
>
> stat
> //data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235
> File:
> `//data2/0d97e01d7fbc48e498897492f0863929/g260674/257/26fd3732cd9f92f2413d10206cd8b0c9-5157235'
> Size: 51163499 Blocks: 100048 IO Block: 4096 regular file
> Device: 821h/2081d Inode: 464404482 Links: 1
> Access: (0664/-rw-rw-r--) Uid: ( 1003/ www) Gid: ( 1003/ www)
> Access: 2011-09-03 22:31:22.000000000 +0800
> Modify: 2011-07-30 01:45:56.000000000 +0800
> Change: 2011-07-30 01:45:56.000000000 +0800
>
>
> Such error occurs randomly on random file. The only relavent thing seems
> like to be open_file_cache directive.
> File sizes are almost 5M to 20M.
> Files had not been changed. I checked the modify time of the files involved
> in these errors. At first I had thought it was caused by a file change too.
> But not.
>
> In my cluster there are also some other similar nodes running with the same
> open_file_cache parameters but without aio enabled(sendfile is used). They
> have no such issue.
Please try the attached patch.
Maxim Dounin
>
>
>
>
> On Sat, Sep 3, 2011 at 6:03 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
>
> > Hello!
> >
> > On Sat, Sep 03, 2011 at 05:46:39PM +0800, dr-dr xp wrote:
> >
> > > Hello everybody. I recently have had some problem with AIO support and
> > need
> > > some help or advice on it.
> > >
> > > With AIO on, everything was OK. But when I enabled open_file_cache, nginx
> > > frequently closed client connection directly without sending back any
> > data.
> > >
> > >
> > > My platform is linux 2.6.30. CentOS 5.4;
> > >
> > > The nginx server that has this issue serves static file at backend.
> > Another
> > > nginx serves as proxy in front of it.
> > > Backend nginx config is like this:
> > > ...
> > > aio on;
> > >
> > >
> > > directio 512;
> > >
> > >
> > > output_buffers 1 128k;
> > > ...
> > > location /files {
> > >
> > >
> > >
> > > internal;
> > >
> > >
> > > alias /;
> > > open_file_cache max=50000 inactive=20s;
> > >
> > >
> > > open_file_cache_valid 30s;
> > >
> > >
> > > open_file_cache_min_uses 1;
> > >
> > >
> > > open_file_cache_errors on;
> > >
> > >
> > > With this config, on the frontend proxy server there are error logs like
> > > this:
> > >
> > > 2011/09/03 17:01:19 [error] 12498#0: *78679759 *upstream prematurely
> > closed
> > > connection while reading response header from upstream*, client:
> > > 221.193.212.77, server: , request: "GET .......... HTTP/1.1", upstream:
> > > "http://
> > **/file/871586e763bf4417bf8d475858deee2b/260110/0b47fc7f4d84e4bcb01a6d6d1fcacc94-3581749",
> > > host.....
> > >
> > >
> > > And I saw in tcpdump output the connection was closed very quickly by
> > > backend nginx:
> > >
> > > 22:02:22.892327 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: S
> > > 1788188296:1788188296(0) win 5840 <mss 1460,sackOK,timestamp 666482608
> > > 0,nop,wscale 7>
> > > 22:02:22.895683 IP 172.16.228.63.irdmi > 172.16.113.32.58271: S
> > > 350810413:350810413(0) ack 1788188297 win 5792 <mss 1460,sackOK,timestamp
> > > 1948226601 66<
> > > 22:02:22.895695 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: . ack 1 win
> > 46
> > > <nop,nop,timestamp 666482612 1948226601>
> > > 22:02:22.895703 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: P
> > 1:181(180)
> > > ack 1 win 46 <nop,nop,timestamp 666482612 1948226601>
> > > 22:02:22.897680 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 181
> > win
> > > 54 <nop,nop,timestamp 1948226603 666482612>
> > > 22:02:22.898348 IP 172.16.228.63.irdmi > 172.16.113.32.58271: F 1:1(0)
> > ack
> > > 181 win 54 <nop,nop,timestamp 1948226603 666482612>
> > > 22:02:22.898473 IP 172.16.113.32.58271 > 172.16.228.63.irdmi: F
> > 181:181(0)
> > > ack 2 win 46 <nop,nop,timestamp 666482615 1948226603>
> > > 22:02:22.899804 IP 172.16.228.63.irdmi > 172.16.113.32.58271: . ack 182
> > win
> > > 54 <nop,nop,timestamp 1948226605 666482615>
> > >
> > >
> > > On the backend nginx, strace shows that io_getevents got an error *"-22"
> > > EINVAL*.
> > >
> > > [pid 5275] recvfrom(48, 0x7fffe4400ae7, 1, 2, 0, 0) = -1 EAGAIN
> > (Resource
> > > temporarily unavailable)
> > > [pid 5275] epoll_wait(6, {{EPOLLIN|EPOLLOUT, {u32=2445415233,
> > > u64=139850875540289}}}, 512, 5672) = 1
> > > [pid 5275] recvfrom(201, "\1\6\0\1\0\337\1\0Status: 301 Moved
> > Perman"...,
> > > 4096, 0, NULL, NULL) = 256
> > > [pid 5275] close(201) = 0
> > > [pid 5275]
> > >
> > stat("//data1/871586e763bf4417bf8d475858deee2b/g268315/252/995f20f76d0d3871fdb5d71bbc92956c-27769564",
> > > {st_mode=S_IFREG|0600, st_size=122507, ...<
> > > [pid 5275] io_submit(139850942226432, 1, {{0xf43718, 0, 0, 0, 40}}) =
> > 1
> > >
> > >
> > > [pid 5275] epoll_wait(6, {{EPOLLIN, {u32=7173248, u64=7173248}}}, 512,
> > > 5671) = 1
> > >
> > > [pid 5275] read(7, "\1\0\0\0\0\0\0\0", 8) = 8
> > > [pid 5275] *io_getevents(139850942226432, 1, 64, {{0xf43718, 0xf436d8,
> > > -22, 0}}, {0, 0}) = 1*
> > > [pid 5275] write(24, "172.16.164.30 - - 172.16.228.63:"..., 338) =
> > 338
> > > [pid 5275] close(48)
> > >
> > > And access.log shows that response length is ZERO:
> > >
> > > [03/Sep/2011:06:28:48 +0800] "GET
> > >
> > /file/871586e763bf4417bf8d475858deee2b/260284/ed31933859d27e33b8398deaea1d2ade-3969549
> > > HTTP/1.0" 206 0 0.001 "-" "-" "-" Range:bytes=27459105-
> > >
> > >
> > > After I removed those *open_file_cache* directives, No such error
> > occurred.
> > >
> > > And the shorter *inactive* is, the more often I get such errors.
> > >
> > >
> > > Anybody has idea about avoiding this or fixing it?
> >
> > Could you please provide debug log? See
> > http://wiki.nginx.org/Debugging for more information.
> >
> > Additionally, some more info about file requested might be
> > helpful: how big it is? whether it changed on backend during
> > requests? if yes, is it changed atomically?
> >
> > Maxim Dounin
> >
> > _______________________________________________
> > nginx mailing list
> > nginx at nginx.org
> > http://mailman.nginx.org/mailman/listinfo/nginx
> >
>
>
>
> --
> 要了几天饱饭就不记得西北风啥味了
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
-------------- next part --------------
# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1315153882 0
# Node ID 16eaef9fbf7335a8ab02ce308799647b26bd11f9
# Parent 5d94f8b3e01d74ec6bd5bdcae176a8d3b998237d
Bugfix: open_file_cache lost is_directio flag.
On file retest open_file_cache lost is_directio if file wasn't changed.
This caused unaligned operations under Linux to fail with EINVAL. It
wasn't noticiable with AIO though, as errors wasn't properly logged.
diff --git a/src/core/ngx_open_file_cache.c b/src/core/ngx_open_file_cache.c
--- a/src/core/ngx_open_file_cache.c
+++ b/src/core/ngx_open_file_cache.c
@@ -290,6 +290,8 @@ ngx_open_cached_file(ngx_open_file_cache
file->use_event = 1;
}
+ of->is_directio = file->is_directio;
+
goto renew;
}
diff --git a/src/os/unix/ngx_file_aio_read.c b/src/os/unix/ngx_file_aio_read.c
--- a/src/os/unix/ngx_file_aio_read.c
+++ b/src/os/unix/ngx_file_aio_read.c
@@ -86,6 +86,9 @@ ngx_file_aio_read(ngx_file_t *file, u_ch
return aio->nbytes;
}
+ ngx_log_error(NGX_LOG_CRIT, file->log, ngx_errno,
+ "aio read \"%s\" failed", file->name.data);
+
return NGX_ERROR;
}
diff --git a/src/os/unix/ngx_linux_aio_read.c b/src/os/unix/ngx_linux_aio_read.c
--- a/src/os/unix/ngx_linux_aio_read.c
+++ b/src/os/unix/ngx_linux_aio_read.c
@@ -74,6 +74,10 @@ ngx_file_aio_read(ngx_file_t *file, u_ch
}
ngx_set_errno(-aio->res);
+
+ ngx_log_error(NGX_LOG_CRIT, file->log, ngx_errno,
+ "aio read \"%s\" failed", file->name.data);
+
return NGX_ERROR;
}
More information about the nginx
mailing list