[PATCH 2 of 4] Simplified sendfile_max_chunk handling

Sergey Kandaurov pluknet at nginx.com
Thu Oct 28 09:50:05 UTC 2021


> On 28 Oct 2021, at 01:56, Maxim Dounin <mdounin at mdounin.ru> wrote:
> 
> Hello!
> 
> On Thu, Oct 28, 2021 at 12:50:25AM +0300, Sergey Kandaurov wrote:
> 
>>> On 27 Oct 2021, at 22:19, Maxim Dounin <mdounin at mdounin.ru> wrote:
>>> 
>>> On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:
>>> 
>>>>> On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
>>>>> 
>>>>> # HG changeset patch
>>>>> # User Maxim Dounin <mdounin at mdounin.ru>
>>>>> # Date 1633978587 -10800
>>>>> #      Mon Oct 11 21:56:27 2021 +0300
>>>>> # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
>>>>> # Parent  d175cd09ac9d2bab7f7226eac3bfce196a296cc0
>>>>> Simplified sendfile_max_chunk handling.
>>>>> 
>>>>> Previously, it was checked that sendfile_max_chunk was enabled and
>>>>> almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
>>>>> delaying connections where sendfile_max_chunk wasn't reached (for example,
>>>>> when sending responses smaller than sendfile_max_chunk).  Now we instead
>>>>> check if there are unsent data, and the connection is still ready for writing.
>>>>> Additionally we also check c->write->delayed to ignore connections already
>>>>> delayed by limit_rate.
>>>>> 
>>>>> This approach is believed to be more robust, and correctly handles
>>>>> not only sendfile_max_chunk, but also internal limits of c->send_chain(),
>>>>> such as sendfile() maximum supported length (ticket #1870).
>>>>> 
>>>>> diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
>>>>> --- a/src/http/ngx_http_write_filter_module.c
>>>>> +++ b/src/http/ngx_http_write_filter_module.c
>>>>> @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
>>>>>       delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
>>>>> 
>>>>>       if (delay > 0) {
>>>>> -            limit = 0;
>>>>>           c->write->delayed = 1;
>>>>>           ngx_add_timer(c->write, delay);
>>>>>       }
>>>>>   }
>>>>> 
>>>>> -    if (limit
>>>>> -        && c->write->ready
>>>>> -        && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
>>>>> -    {
>>>>> +    if (chain && c->write->ready && !c->write->delayed) {
>>>>>       ngx_post_event(c->write, &ngx_posted_next_events);
>>>>>   }
>>>>> 
>>>> 
>>>> Looks good.
>>>> 
>>>> Not strictly related to this change, so FYI.  I noticed a stray writev()
>>>> after Linux sendfile(), when it writes more than its internal limits.
>>>> 
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
>>>> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
>>>> 
>>>> Here sendfile() partially sent 2147479552, which is above its internal
>>>> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize.  On the second iteration,
>>>> due to this, it falls back to writev() with zero-size headers.
>>>> Then, with the patch applied, it posts the next write event, as designed
>>>> (previously, it would seemingly stuck instead, such as in ticket #1870).
>>> 
>>> Interesting.
>>> 
>>> Overall it looks harmless, but we may want to look further why 
>>> sendfile() only sent 2147479552 instead of 2147482891.  It seems 
>>> that 2147479552 is in pages (524287 x 4096) despite the fact that 
>>> the initial offset is not page-aligned.  We expect sendfile() to 
>>> send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).
>>> 
>>> Looking into Linux sendfile() manpage suggests that 2,147,479,552 
>>> is a documented limit:
>>> 
>>>      sendfile() will transfer  at  most  0x7ffff000  (2,147,479,552)
>>>      bytes,  returning  the  number  of  bytes actually transferred.
>>>      (This is true on both 32-bit and 64-bit systems.)
>>> 
>>> This seems to be mostly arbitrary limitation appeared in Linux 
>>> kernel 2.6.16
>>> (https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).
>>> 
>>> Interesting enough, the actual limitation is not 0x7ffff000 as 
>>> documented, but instead MAX_RW_COUNT, which is defined as
>>> (INT_MAX & PAGE_MASK).  This suggests that the behaviour will be 
>>> actually different on platforms with larger pages.
>>> 
>>> Something as simple as:
>>> 
>>> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
>>> --- a/src/os/unix/ngx_linux_sendfile_chain.c
>>> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
>>> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>>>             */
>>> 
>>>            send = prev_send + sent;
>>> -            continue;
>>>        }
>>> 
>>>        if (send >= limit || in == NULL) {
>>> 
>>> should be enough to resolve this additional 0-sized writev().  
>>> Untested though, as I don't have a test playground on hand where 
>>> 2G sendfile() can be reached.  It would be great if you'll test 
>>> it.
>>> 
>> 
>> That seems to help:
>> 
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write old buf t:1 f:0 000055D8D328FDB0,
>> pos 000055D8D328FDB0, size: 252 file: 0, size: 0
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write new buf t:0 f:1 0000000000000000,
>> pos 0000000000000000, size: 0 file: 0, size: 4294967296
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter: l:1 f:0 s:4294967548
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter limit 0
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 writev: 252 of 252
>> [.. next ngx_linux_sendfile_chain() loop iteration ..]
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: @0 2147479552
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: 2147479552 of 2147479552 @0
>> [.. return from ngx_linux_sendfile_chain() on exceeded limit ..]
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter 000055D8D329C8D0
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 post event 000055D8D35CC660
> 
> Thanks for testing.
> 
>>> Full patch:
>>> 
>>> # HG changeset patch
>>> # User Maxim Dounin <mdounin at mdounin.ru>
>>> # Date 1635361800 -10800
>>> #      Wed Oct 27 22:10:00 2021 +0300
>>> # Node ID 859447c7b7076b676a3421597514b324b708658d
>>> # Parent  2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
>>> Fixed sendfile() limit handling on Linux.
>>> 
>>> On Linux starting with 2.6.16, sendfile() silently limits all operations
>>> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK).  This incorrectly
>>> triggered the interrupt check, and resulted in 0-sized writev() on the
>>> next loop iteration.
>>> 
>>> Fix is to make sure the limit is always checked, so we will return from
>>> the loop if the limit is already reached even if number of bytes sent is
>>> not exactly equal to the number of bytes we've tried to send.
>>> 
>>> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
>>> --- a/src/os/unix/ngx_linux_sendfile_chain.c
>>> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
>>> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>>>             */
>>> 
>>>            send = prev_send + sent;
>>> -            continue;
>>>        }
>>> 
>>>        if (send >= limit || in == NULL) {
>>> 
>> 
>> The change looks good to me.
>> 
>> Btw, this should also stop exceeding the limit after several sendfile()
>> calls each interrupted, on Linux 4.3+ (which is rather theoretical).
> 
> The limiting takes "send" into account, so I don't see how the 
> limit can be exceeded.
> 
>> It probably deserves updating comments in this file about the count
>> parameter constraints.
> 
> The exact behaviour does not seem to be relevant to the resulting 
> code (in particular, the patch does not change the 
> NGX_SENDFILE_MAXSIZE limit).  On the other hand, I agree that it 
> might make sense to update the comment anyway, in particular, to 
> make it clear that the 2G limit is still relevant to current 
> kernels.  I've added the following to the patch:
> 
> @@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha
>  * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
>  * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
>  * so we limit it to 2G-1 bytes.
> + *
> + * On Linux 2.6.16 and later, sendfile() silently limits the count parameter
> + * to 2G minus the page size, even on 64-bit platforms.
>  */
> 
> #define NGX_SENDFILE_MAXSIZE  2147483647L
> 
> 
> Full patch:
> 
> # HG changeset patch
> # User Maxim Dounin <mdounin at mdounin.ru>
> # Date 1635374871 -10800
> #      Thu Oct 28 01:47:51 2021 +0300
> # Node ID 3c5679dfe561e3087a96acabe4cf73ef232acabb
> # Parent  2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
> Fixed sendfile() limit handling on Linux.
> 
> On Linux starting with 2.6.16, sendfile() silently limits all operations
> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK).  This incorrectly
> triggered the interrupt check, and resulted in 0-sized writev() on the
> next loop iteration.
> 
> Fix is to make sure the limit is always checked, so we will return from
> the loop if the limit is already reached even if number of bytes sent is
> not exactly equal to the number of bytes we've tried to send.
> 
> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> --- a/src/os/unix/ngx_linux_sendfile_chain.c
> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> @@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha
>  * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
>  * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
>  * so we limit it to 2G-1 bytes.
> + *
> + * On Linux 2.6.16 and later, sendfile() silently limits the count parameter
> + * to 2G minus the page size, even on 64-bit platforms.
>  */
> 
> #define NGX_SENDFILE_MAXSIZE  2147483647L
> @@ -216,7 +219,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>              */
> 
>             send = prev_send + sent;
> -            continue;
>         }
> 
>         if (send >= limit || in == NULL) {
> 

Looks fine.

-- 
Sergey Kandaurov



More information about the nginx-devel mailing list