[PATCH 2 of 4] Simplified sendfile_max_chunk handling
Maxim Dounin
mdounin at mdounin.ru
Wed Oct 27 19:19:25 UTC 2021
Hello!
On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:
> > On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
> >
> > # HG changeset patch
> > # User Maxim Dounin <mdounin at mdounin.ru>
> > # Date 1633978587 -10800
> > # Mon Oct 11 21:56:27 2021 +0300
> > # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
> > # Parent d175cd09ac9d2bab7f7226eac3bfce196a296cc0
> > Simplified sendfile_max_chunk handling.
> >
> > Previously, it was checked that sendfile_max_chunk was enabled and
> > almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
> > delaying connections where sendfile_max_chunk wasn't reached (for example,
> > when sending responses smaller than sendfile_max_chunk). Now we instead
> > check if there are unsent data, and the connection is still ready for writing.
> > Additionally we also check c->write->delayed to ignore connections already
> > delayed by limit_rate.
> >
> > This approach is believed to be more robust, and correctly handles
> > not only sendfile_max_chunk, but also internal limits of c->send_chain(),
> > such as sendfile() maximum supported length (ticket #1870).
> >
> > diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
> > --- a/src/http/ngx_http_write_filter_module.c
> > +++ b/src/http/ngx_http_write_filter_module.c
> > @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
> > delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
> >
> > if (delay > 0) {
> > - limit = 0;
> > c->write->delayed = 1;
> > ngx_add_timer(c->write, delay);
> > }
> > }
> >
> > - if (limit
> > - && c->write->ready
> > - && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
> > - {
> > + if (chain && c->write->ready && !c->write->delayed) {
> > ngx_post_event(c->write, &ngx_posted_next_events);
> > }
> >
>
> Looks good.
>
> Not strictly related to this change, so FYI. I noticed a stray writev()
> after Linux sendfile(), when it writes more than its internal limits.
>
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
>
> Here sendfile() partially sent 2147479552, which is above its internal
> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize. On the second iteration,
> due to this, it falls back to writev() with zero-size headers.
> Then, with the patch applied, it posts the next write event, as designed
> (previously, it would seemingly stuck instead, such as in ticket #1870).
Interesting.
Overall it looks harmless, but we may want to look further why
sendfile() only sent 2147479552 instead of 2147482891. It seems
that 2147479552 is in pages (524287 x 4096) despite the fact that
the initial offset is not page-aligned. We expect sendfile() to
send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).
Looking into Linux sendfile() manpage suggests that 2,147,479,552
is a documented limit:
sendfile() will transfer at most 0x7ffff000 (2,147,479,552)
bytes, returning the number of bytes actually transferred.
(This is true on both 32-bit and 64-bit systems.)
This seems to be mostly arbitrary limitation appeared in Linux
kernel 2.6.16
(https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).
Interesting enough, the actual limitation is not 0x7ffff000 as
documented, but instead MAX_RW_COUNT, which is defined as
(INT_MAX & PAGE_MASK). This suggests that the behaviour will be
actually different on platforms with larger pages.
Something as simple as:
diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
--- a/src/os/unix/ngx_linux_sendfile_chain.c
+++ b/src/os/unix/ngx_linux_sendfile_chain.c
@@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
*/
send = prev_send + sent;
- continue;
}
if (send >= limit || in == NULL) {
should be enough to resolve this additional 0-sized writev().
Untested though, as I don't have a test playground on hand where
2G sendfile() can be reached. It would be great if you'll test
it.
Full patch:
# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1635361800 -10800
# Wed Oct 27 22:10:00 2021 +0300
# Node ID 859447c7b7076b676a3421597514b324b708658d
# Parent 2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
Fixed sendfile() limit handling on Linux.
On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.
Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.
diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
--- a/src/os/unix/ngx_linux_sendfile_chain.c
+++ b/src/os/unix/ngx_linux_sendfile_chain.c
@@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
*/
send = prev_send + sent;
- continue;
}
if (send >= limit || in == NULL) {
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list