[PATCH 2 of 4] Simplified sendfile_max_chunk handling
Sergey Kandaurov
pluknet at nginx.com
Thu Oct 28 09:50:05 UTC 2021
> On 28 Oct 2021, at 01:56, Maxim Dounin <mdounin at mdounin.ru> wrote:
>
> Hello!
>
> On Thu, Oct 28, 2021 at 12:50:25AM +0300, Sergey Kandaurov wrote:
>
>>> On 27 Oct 2021, at 22:19, Maxim Dounin <mdounin at mdounin.ru> wrote:
>>>
>>> On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:
>>>
>>>>> On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
>>>>>
>>>>> # HG changeset patch
>>>>> # User Maxim Dounin <mdounin at mdounin.ru>
>>>>> # Date 1633978587 -10800
>>>>> # Mon Oct 11 21:56:27 2021 +0300
>>>>> # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
>>>>> # Parent d175cd09ac9d2bab7f7226eac3bfce196a296cc0
>>>>> Simplified sendfile_max_chunk handling.
>>>>>
>>>>> Previously, it was checked that sendfile_max_chunk was enabled and
>>>>> almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
>>>>> delaying connections where sendfile_max_chunk wasn't reached (for example,
>>>>> when sending responses smaller than sendfile_max_chunk). Now we instead
>>>>> check if there are unsent data, and the connection is still ready for writing.
>>>>> Additionally we also check c->write->delayed to ignore connections already
>>>>> delayed by limit_rate.
>>>>>
>>>>> This approach is believed to be more robust, and correctly handles
>>>>> not only sendfile_max_chunk, but also internal limits of c->send_chain(),
>>>>> such as sendfile() maximum supported length (ticket #1870).
>>>>>
>>>>> diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
>>>>> --- a/src/http/ngx_http_write_filter_module.c
>>>>> +++ b/src/http/ngx_http_write_filter_module.c
>>>>> @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
>>>>> delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
>>>>>
>>>>> if (delay > 0) {
>>>>> - limit = 0;
>>>>> c->write->delayed = 1;
>>>>> ngx_add_timer(c->write, delay);
>>>>> }
>>>>> }
>>>>>
>>>>> - if (limit
>>>>> - && c->write->ready
>>>>> - && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
>>>>> - {
>>>>> + if (chain && c->write->ready && !c->write->delayed) {
>>>>> ngx_post_event(c->write, &ngx_posted_next_events);
>>>>> }
>>>>>
>>>>
>>>> Looks good.
>>>>
>>>> Not strictly related to this change, so FYI. I noticed a stray writev()
>>>> after Linux sendfile(), when it writes more than its internal limits.
>>>>
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
>>>> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
>>>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
>>>>
>>>> Here sendfile() partially sent 2147479552, which is above its internal
>>>> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize. On the second iteration,
>>>> due to this, it falls back to writev() with zero-size headers.
>>>> Then, with the patch applied, it posts the next write event, as designed
>>>> (previously, it would seemingly stuck instead, such as in ticket #1870).
>>>
>>> Interesting.
>>>
>>> Overall it looks harmless, but we may want to look further why
>>> sendfile() only sent 2147479552 instead of 2147482891. It seems
>>> that 2147479552 is in pages (524287 x 4096) despite the fact that
>>> the initial offset is not page-aligned. We expect sendfile() to
>>> send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).
>>>
>>> Looking into Linux sendfile() manpage suggests that 2,147,479,552
>>> is a documented limit:
>>>
>>> sendfile() will transfer at most 0x7ffff000 (2,147,479,552)
>>> bytes, returning the number of bytes actually transferred.
>>> (This is true on both 32-bit and 64-bit systems.)
>>>
>>> This seems to be mostly arbitrary limitation appeared in Linux
>>> kernel 2.6.16
>>> (https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).
>>>
>>> Interesting enough, the actual limitation is not 0x7ffff000 as
>>> documented, but instead MAX_RW_COUNT, which is defined as
>>> (INT_MAX & PAGE_MASK). This suggests that the behaviour will be
>>> actually different on platforms with larger pages.
>>>
>>> Something as simple as:
>>>
>>> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
>>> --- a/src/os/unix/ngx_linux_sendfile_chain.c
>>> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
>>> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>>> */
>>>
>>> send = prev_send + sent;
>>> - continue;
>>> }
>>>
>>> if (send >= limit || in == NULL) {
>>>
>>> should be enough to resolve this additional 0-sized writev().
>>> Untested though, as I don't have a test playground on hand where
>>> 2G sendfile() can be reached. It would be great if you'll test
>>> it.
>>>
>>
>> That seems to help:
>>
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write old buf t:1 f:0 000055D8D328FDB0,
>> pos 000055D8D328FDB0, size: 252 file: 0, size: 0
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 write new buf t:0 f:1 0000000000000000,
>> pos 0000000000000000, size: 0 file: 0, size: 4294967296
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter: l:1 f:0 s:4294967548
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter limit 0
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 writev: 252 of 252
>> [.. next ngx_linux_sendfile_chain() loop iteration ..]
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: @0 2147479552
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: 2147479552 of 2147479552 @0
>> [.. return from ngx_linux_sendfile_chain() on exceeded limit ..]
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter 000055D8D329C8D0
>> 2021/10/27 20:36:31 [debug] 1498568#0: *1 post event 000055D8D35CC660
>
> Thanks for testing.
>
>>> Full patch:
>>>
>>> # HG changeset patch
>>> # User Maxim Dounin <mdounin at mdounin.ru>
>>> # Date 1635361800 -10800
>>> # Wed Oct 27 22:10:00 2021 +0300
>>> # Node ID 859447c7b7076b676a3421597514b324b708658d
>>> # Parent 2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
>>> Fixed sendfile() limit handling on Linux.
>>>
>>> On Linux starting with 2.6.16, sendfile() silently limits all operations
>>> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
>>> triggered the interrupt check, and resulted in 0-sized writev() on the
>>> next loop iteration.
>>>
>>> Fix is to make sure the limit is always checked, so we will return from
>>> the loop if the limit is already reached even if number of bytes sent is
>>> not exactly equal to the number of bytes we've tried to send.
>>>
>>> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
>>> --- a/src/os/unix/ngx_linux_sendfile_chain.c
>>> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
>>> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
>>> */
>>>
>>> send = prev_send + sent;
>>> - continue;
>>> }
>>>
>>> if (send >= limit || in == NULL) {
>>>
>>
>> The change looks good to me.
>>
>> Btw, this should also stop exceeding the limit after several sendfile()
>> calls each interrupted, on Linux 4.3+ (which is rather theoretical).
>
> The limiting takes "send" into account, so I don't see how the
> limit can be exceeded.
>
>> It probably deserves updating comments in this file about the count
>> parameter constraints.
>
> The exact behaviour does not seem to be relevant to the resulting
> code (in particular, the patch does not change the
> NGX_SENDFILE_MAXSIZE limit). On the other hand, I agree that it
> might make sense to update the comment anyway, in particular, to
> make it clear that the 2G limit is still relevant to current
> kernels. I've added the following to the patch:
>
> @@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha
> * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
> * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
> * so we limit it to 2G-1 bytes.
> + *
> + * On Linux 2.6.16 and later, sendfile() silently limits the count parameter
> + * to 2G minus the page size, even on 64-bit platforms.
> */
>
> #define NGX_SENDFILE_MAXSIZE 2147483647L
>
>
> Full patch:
>
> # HG changeset patch
> # User Maxim Dounin <mdounin at mdounin.ru>
> # Date 1635374871 -10800
> # Thu Oct 28 01:47:51 2021 +0300
> # Node ID 3c5679dfe561e3087a96acabe4cf73ef232acabb
> # Parent 2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
> Fixed sendfile() limit handling on Linux.
>
> On Linux starting with 2.6.16, sendfile() silently limits all operations
> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
> triggered the interrupt check, and resulted in 0-sized writev() on the
> next loop iteration.
>
> Fix is to make sure the limit is always checked, so we will return from
> the loop if the limit is already reached even if number of bytes sent is
> not exactly equal to the number of bytes we've tried to send.
>
> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> --- a/src/os/unix/ngx_linux_sendfile_chain.c
> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> @@ -38,6 +38,9 @@ static void ngx_linux_sendfile_thread_ha
> * On Linux up to 2.6.16 sendfile() does not allow to pass the count parameter
> * more than 2G-1 bytes even on 64-bit platforms: it returns EINVAL,
> * so we limit it to 2G-1 bytes.
> + *
> + * On Linux 2.6.16 and later, sendfile() silently limits the count parameter
> + * to 2G minus the page size, even on 64-bit platforms.
> */
>
> #define NGX_SENDFILE_MAXSIZE 2147483647L
> @@ -216,7 +219,6 @@ ngx_linux_sendfile_chain(ngx_connection_
> */
>
> send = prev_send + sent;
> - continue;
> }
>
> if (send >= limit || in == NULL) {
>
Looks fine.
--
Sergey Kandaurov
More information about the nginx-devel
mailing list