[PATCH 2 of 4] Simplified sendfile_max_chunk handling

Sergey Kandaurov pluknet at nginx.com
Wed Oct 27 21:50:25 UTC 2021


> On 27 Oct 2021, at 22:19, Maxim Dounin <mdounin at mdounin.ru> wrote:
> 
> Hello!
> 
> On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:
> 
>>> On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
>>> 
>>> # HG changeset patch
>>> # User Maxim Dounin <mdounin at mdounin.ru>
>>> # Date 1633978587 -10800
>>> #      Mon Oct 11 21:56:27 2021 +0300
>>> # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
>>> # Parent  d175cd09ac9d2bab7f7226eac3bfce196a296cc0
>>> Simplified sendfile_max_chunk handling.
>>> 
>>> Previously, it was checked that sendfile_max_chunk was enabled and
>>> almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
>>> delaying connections where sendfile_max_chunk wasn't reached (for example,
>>> when sending responses smaller than sendfile_max_chunk).  Now we instead
>>> check if there are unsent data, and the connection is still ready for writing.
>>> Additionally we also check c->write->delayed to ignore connections already
>>> delayed by limit_rate.
>>> 
>>> This approach is believed to be more robust, and correctly handles
>>> not only sendfile_max_chunk, but also internal limits of c->send_chain(),
>>> such as sendfile() maximum supported length (ticket #1870).
>>> 
>>> diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
>>> --- a/src/http/ngx_http_write_filter_module.c
>>> +++ b/src/http/ngx_http_write_filter_module.c
>>> @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
>>>        delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
>>> 
>>>        if (delay > 0) {
>>> -            limit = 0;
>>>            c->write->delayed = 1;
>>>            ngx_add_timer(c->write, delay);
>>>        }
>>>    }
>>> 
>>> -    if (limit
>>> -        && c->write->ready
>>> -        && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
>>> -    {
>>> +    if (chain && c->write->ready && !c->write->delayed) {
>>>        ngx_post_event(c->write, &ngx_posted_next_events);
>>>    }
>>> 
>> 
>> Looks good.
>> 
>> Not strictly related to this change, so FYI.  I noticed a stray writev()
>> after Linux sendfile(), when it writes more than its internal limits.
>> 
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
>> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
>> 
>> Here sendfile() partially sent 2147479552, which is above its internal
>> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize.  On the second iteration,
>> due to this, it falls back to writev() with zero-size headers.
>> Then, with the patch applied, it posts the next write event, as designed
>> (previously, it would seemingly stuck instead, such as in ticket #1870).
> 
> Interesting.
> 
> Overall it looks harmless, but we may want to look further why 
> sendfile() only sent 2147479552 instead of 2147482891.  It seems 
> that 2147479552 is in pages (524287 x 4096) despite the fact that 
> the initial offset is not page-aligned.  We expect sendfile() to 
> send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).
> 
> Looking into Linux sendfile() manpage suggests that 2,147,479,552 
> is a documented limit:
> 
>       sendfile() will transfer  at  most  0x7ffff000  (2,147,479,552)
>       bytes,  returning  the  number  of  bytes actually transferred.
>       (This is true on both 32-bit and 64-bit systems.)
> 
> This seems to be mostly arbitrary limitation appeared in Linux 
> kernel 2.6.16
> (https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).
> 
> Interesting enough, the actual limitation is not 0x7ffff000 as 
> documented, but instead MAX_RW_COUNT, which is defined as
> (INT_MAX & PAGE_MASK).  This suggests that the behaviour will be 
> actually different on platforms with larger pages.
> 
> Something as simple as:
> 
> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> --- a/src/os/unix/ngx_linux_sendfile_chain.c
> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>              */
> 
>             send = prev_send + sent;
> -            continue;
>         }
> 
>         if (send >= limit || in == NULL) {
> 
> should be enough to resolve this additional 0-sized writev().  
> Untested though, as I don't have a test playground on hand where 
> 2G sendfile() can be reached.  It would be great if you'll test 
> it.
> 

That seems to help:

2021/10/27 20:36:31 [debug] 1498568#0: *1 write old buf t:1 f:0 000055D8D328FDB0,
 pos 000055D8D328FDB0, size: 252 file: 0, size: 0
2021/10/27 20:36:31 [debug] 1498568#0: *1 write new buf t:0 f:1 0000000000000000,
 pos 0000000000000000, size: 0 file: 0, size: 4294967296
2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter: l:1 f:0 s:4294967548
2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter limit 0
2021/10/27 20:36:31 [debug] 1498568#0: *1 writev: 252 of 252
[.. next ngx_linux_sendfile_chain() loop iteration ..]
2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: @0 2147479552
2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: 2147479552 of 2147479552 @0
[.. return from ngx_linux_sendfile_chain() on exceeded limit ..]
2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter 000055D8D329C8D0
2021/10/27 20:36:31 [debug] 1498568#0: *1 post event 000055D8D35CC660


> Full patch:
> 
> # HG changeset patch
> # User Maxim Dounin <mdounin at mdounin.ru>
> # Date 1635361800 -10800
> #      Wed Oct 27 22:10:00 2021 +0300
> # Node ID 859447c7b7076b676a3421597514b324b708658d
> # Parent  2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
> Fixed sendfile() limit handling on Linux.
> 
> On Linux starting with 2.6.16, sendfile() silently limits all operations
> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK).  This incorrectly
> triggered the interrupt check, and resulted in 0-sized writev() on the
> next loop iteration.
> 
> Fix is to make sure the limit is always checked, so we will return from
> the loop if the limit is already reached even if number of bytes sent is
> not exactly equal to the number of bytes we've tried to send.
> 
> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> --- a/src/os/unix/ngx_linux_sendfile_chain.c
> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>              */
> 
>             send = prev_send + sent;
> -            continue;
>         }
> 
>         if (send >= limit || in == NULL) {
> 

The change looks good to me.

Btw, this should also stop exceeding the limit after several sendfile()
calls each interrupted, on Linux 4.3+ (which is rather theoretical).
It probably deserves updating comments in this file about the count
parameter constraints.

-- 
Sergey Kandaurov



More information about the nginx-devel mailing list