[PATCH 2 of 4] Simplified sendfile_max_chunk handling

Maxim Dounin mdounin at mdounin.ru
Wed Oct 27 22:56:25 UTC 2021


Hello!

On Thu, Oct 28, 2021 at 12:50:25AM +0300, Sergey Kandaurov wrote:

> > On 27 Oct 2021, at 22:19, Maxim Dounin <mdounin at mdounin.ru> wrote:
> > 
> > On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:
> > 
> >>> On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
> >>> 
> >>> # HG changeset patch
> >>> # User Maxim Dounin <mdounin at mdounin.ru>
> >>> # Date 1633978587 -10800
> >>> #      Mon Oct 11 21:56:27 2021 +0300
> >>> # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
> >>> # Parent  d175cd09ac9d2bab7f7226eac3bfce196a296cc0
> >>> Simplified sendfile_max_chunk handling.
> >>> 
> >>> Previously, it was checked that sendfile_max_chunk was enabled and
> >>> almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
> >>> delaying connections where sendfile_max_chunk wasn't reached (for example,
> >>> when sending responses smaller than sendfile_max_chunk).  Now we instead
> >>> check if there are unsent data, and the connection is still ready for writing.
> >>> Additionally we also check c->write->delayed to ignore connections already
> >>> delayed by limit_rate.
> >>> 
> >>> This approach is believed to be more robust, and correctly handles
> >>> not only sendfile_max_chunk, but also internal limits of c->send_chain(),
> >>> such as sendfile() maximum supported length (ticket #1870).
> >>> 
> >>> diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
> >>> --- a/src/http/ngx_http_write_filter_module.c
> >>> +++ b/src/http/ngx_http_write_filter_module.c
> >>> @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
> >>>        delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
> >>> 
> >>>        if (delay > 0) {
> >>> -            limit = 0;
> >>>            c->write->delayed = 1;
> >>>            ngx_add_timer(c->write, delay);
> >>>        }
> >>>    }
> >>> 
> >>> -    if (limit
> >>> -        && c->write->ready
> >>> -        && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
> >>> -    {
> >>> +    if (chain && c->write->ready && !c->write->delayed) {
> >>>        ngx_post_event(c->write, &ngx_posted_next_events);
> >>>    }
> >>> 
> >> 
> >> Looks good.
> >> 
> >> Not strictly related to this change, so FYI.  I noticed a stray writev()
> >> after Linux sendfile(), when it writes more than its internal limits.
> >> 
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
> >> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
> >> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
> >> 
> >> Here sendfile() partially sent 2147479552, which is above its internal
> >> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize.  On the second iteration,
> >> due to this, it falls back to writev() with zero-size headers.
> >> Then, with the patch applied, it posts the next write event, as designed
> >> (previously, it would seemingly stuck instead, such as in ticket #1870).
> > 
> > Interesting.
> > 
> > Overall it looks harmless, but we may want to look further why 
> > sendfile() only sent 2147479552 instead of 2147482891.  It seems 
> > that 2147479552 is in pages (524287 x 4096) despite the fact that 
> > the initial offset is not page-aligned.  We expect sendfile() to 
> > send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).
> > 
> > Looking into Linux sendfile() manpage suggests that 2,147,479,552 
> > is a documented limit:
> > 
> >       sendfile() will transfer  at  most  0x7ffff000  (2,147,479,552)
> >       bytes,  returning  the  number  of  bytes actually transferred.
> >       (This is true on both 32-bit and 64-bit systems.)
> > 
> > This seems to be mostly arbitrary limitation appeared in Linux 
> > kernel 2.6.16
> > (https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).
> > 
> > Interesting enough, the actual limitation is not 0x7ffff000 as 
> > documented, but instead MAX_RW_COUNT, which is defined as
> > (INT_MAX & PAGE_MASK).  This suggests that the behaviour will be 
> > actually different on platforms with larger pages.
> > 
> > Something as simple as:
> > 
> > diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> > --- a/src/os/unix/ngx_linux_sendfile_chain.c
> > +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> > @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
> >              */
> > 
> >             send = prev_send + sent;
> > -            continue;
> >         }
> > 
> >         if (send >= limit || in == NULL) {
> > 
> > should be enough to resolve this additional 0-sized writev().  
> > Untested though, as I don't have a test playground on hand where 
> > 2G sendfile() can be reached.  It would be great if you'll test 
> > it.
> > 
> 
> That seems to help:
> 
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write old buf t:1 f:0 000055D8D328FDB0,
>  pos 000055D8D328FDB0, size: 252 file: 0, size: 0
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write new buf t:0 f:1 0000000000000000,
>  pos 0000000000000000, size: 0 file: 0, size: 4294967296
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter: l:1 f:0 s:4294967548
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter limit 0
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 writev: 252 of 252
> [.. next ngx_linux_sendfile_chain() loop iteration ..]
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: @0 2147479552
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: 2147479552 of 2147479552 @0
> [.. return from ngx_linux_sendfile_chain() on exceeded limit ..]
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter 000055D8D329C8D0
> 2021/10/27 20:36:31 [debug] 1498568#0: *1 post event 000055D8D35CC660

Thanks for testing.

> > Full patch:
> > 
> > # HG changeset patch
> > # User Maxim Dounin <mdounin at mdounin.ru>
> > # Date 1635361800 -10800
> > #      Wed Oct 27 22:10:00 2021 +0300
> > # Node ID 859447c7b7076b676a3421597514b324b708658d
> > # Parent  2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
> > Fixed sendfile() limit handling on Linux.
> > 
> > On Linux starting with 2.6.16, sendfile() silently limits all operations
> > to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK).  This incorrectly
> > triggered the interrupt check, and resulted in 0-sized writev() on the
> > next loop iteration.
> > 
> > Fix is to make sure the limit is always checked, so we will return from
> > the loop if the limit is already reached even if number of bytes sent is
> > not exactly equal to the number of bytes we've tried to send.
> > 
> > diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> > --- a/src/os/unix/ngx_linux_sendfile_chain.c
> > +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> > @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
> >              */
> > 
> >             send = prev_send + sent;
> > -            continue;
> >         }
> > 
> >         if (send >= limit || in == NULL) {
> > 
> 
> The change looks good to me.
> 
> Btw, this should also stop exceeding the limit after several sendfile()
> calls each interrupted, on Linux 4.3+ (which is rather theoretical).

The limiting takes "send" into account, so I don't see how the 
limit can be exceeded.

> It probably deserves updating comments in this file about the count
> parameter constraints.

The exact behaviour does not seem to be relevant to the resulting 
code (in particular, the patch does not change the 
NGX_SENDFILE_MAXSIZE limit).  On the other hand, I agree that it 
might make sense to update the comment anyway, in particular, to 
make it clear that the 2G limit is still relevant to current 
kernels.  I've added the following to the patch:

@@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha
  * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
  * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
  * so we limit it to 2G-1 bytes.
+ *
+ * On Linux 2.6.16 and later, sendfile() silently limits the count parameter
+ * to 2G minus the page size, even on 64-bit platforms.
  */
 
 #define NGX_SENDFILE_MAXSIZE  2147483647L


Full patch:

# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1635374871 -10800
#      Thu Oct 28 01:47:51 2021 +0300
# Node ID 3c5679dfe561e3087a96acabe4cf73ef232acabb
# Parent  2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
Fixed sendfile() limit handling on Linux.

On Linux starting with 2.6.16, sendfile() silently limits all operations
to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK).  This incorrectly
triggered the interrupt check, and resulted in 0-sized writev() on the
next loop iteration.

Fix is to make sure the limit is always checked, so we will return from
the loop if the limit is already reached even if number of bytes sent is
not exactly equal to the number of bytes we've tried to send.

diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
--- a/src/os/unix/ngx_linux_sendfile_chain.c
+++ b/src/os/unix/ngx_linux_sendfile_chain.c
@@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha
  * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
  * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
  * so we limit it to 2G-1 bytes.
+ *
+ * On Linux 2.6.16 and later, sendfile() silently limits the count parameter
+ * to 2G minus the page size, even on 64-bit platforms.
  */
 
 #define NGX_SENDFILE_MAXSIZE  2147483647L
@@ -216,7 +219,6 @@ ngx_linux_sendfile_chain(ngx_connection_
              */
 
             send = prev_send + sent;
-            continue;
         }
 
         if (send >= limit || in == NULL) {


-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx-devel mailing list