[PATCH 2 of 4] Simplified sendfile_max_chunk handling

Maxim Dounin mdounin at mdounin.ru
Wed Oct 27 19:19:25 UTC 2021


Hello!

On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:

> > On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
> > 
> > # HG changeset patch
> > # User Maxim Dounin <mdounin at mdounin.ru>
> > # Date 1633978587 -10800
> > #      Mon Oct 11 21:56:27 2021 +0300
> > # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
> > # Parent  d175cd09ac9d2bab7f7226eac3bfce196a296cc0
> > Simplified sendfile_max_chunk handling.
> > 
> > Previously, it was checked that sendfile_max_chunk was enabled and
> > almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
> > delaying connections where sendfile_max_chunk wasn't reached (for example,
> > when sending responses smaller than sendfile_max_chunk).  Now we instead
> > check if there are unsent data, and the connection is still ready for writing.
> > Additionally we also check c->write->delayed to ignore connections already
> > delayed by limit_rate.
> > 
> > This approach is believed to be more robust, and correctly handles
> > not only sendfile_max_chunk, but also internal limits of c->send_chain(),
> > such as sendfile() maximum supported length (ticket #1870).
> > 
> > diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
> > --- a/src/http/ngx_http_write_filter_module.c
> > +++ b/src/http/ngx_http_write_filter_module.c
> > @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
> >         delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
> > 
> >         if (delay > 0) {
> > -            limit = 0;
> >             c->write->delayed = 1;
> >             ngx_add_timer(c->write, delay);
> >         }
> >     }
> > 
> > -    if (limit
> > -        && c->write->ready
> > -        && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
> > -    {
> > +    if (chain && c->write->ready && !c->write->delayed) {
> >         ngx_post_event(c->write, &ngx_posted_next_events);
> >     }
> > 
> 
> Looks good.
> 
> Not strictly related to this change, so FYI.  I noticed a stray writev()
> after Linux sendfile(), when it writes more than its internal limits.
> 
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
>  pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
> 
> Here sendfile() partially sent 2147479552, which is above its internal
> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize.  On the second iteration,
> due to this, it falls back to writev() with zero-size headers.
> Then, with the patch applied, it posts the next write event, as designed
> (previously, it would seemingly stuck instead, such as in ticket #1870).

Interesting.

Overall it looks harmless, but we may want to look further why 
sendfile() only sent 2147479552 instead of 2147482891.  It seems 
that 2147479552 is in pages (524287 x 4096) despite the fact that 
the initial offset is not page-aligned.  We expect sendfile() to 
send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).

Looking into Linux sendfile() manpage suggests that 2,147,479,552 
is a documented limit:

       sendfile() will transfer  at  most  0x7ffff000  (2,147,479,552)
       bytes,  returning  the  number  of  bytes actually transferred.
       (This is true on both 32-bit and 64-bit systems.)

This seems to be mostly arbitrary limitation appeared in Linux 
kernel 2.6.16
(https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).

Interesting enough, the actual limitation is not 0x7ffff000 as 
documented, but instead MAX_RW_COUNT, which is defined as
(INT_MAX & PAGE_MASK).  This suggests that the behaviour will be 
actually different on platforms with larger pages.

Something as simple as:

diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
--- a/src/os/unix/ngx_linux_sendfile_chain.c
+++ b/src/os/unix/ngx_linux_sendfile_chain.c
@@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
              */
 
             send = prev_send + sent;
-            continue;
         }
 
         if (send >= limit || in == NULL) {

should be enough to resolve this additional 0-sized writev().  
Untested though, as I don't have a test playground on hand where 
2G sendfile() can be reached.  It would be great if you'll test 
it.

Full patch:

# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1635361800 -10800
#      Wed Oct 27 22:10:00 2021 +0300
# Node ID 859447c7b7076b676a3421597514b324b708658d
# Parent  2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
Fixed sendfile() limit handling on Linux.

On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK).  This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.

Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.

diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
--- a/src/os/unix/ngx_linux_sendfile_chain.c
+++ b/src/os/unix/ngx_linux_sendfile_chain.c
@@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
              */
 
             send = prev_send + sent;
-            continue;
         }
 
         if (send >= limit || in == NULL) {


-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx-devel mailing list