[PATCH 2 of 4] Simplified sendfile_max_chunk handling
Sergey Kandaurov
pluknet at nginx.com
Wed Oct 27 21:50:25 UTC 2021
> On 27 Oct 2021, at 22:19, Maxim Dounin <mdounin at mdounin.ru> wrote:
>
> Hello!
>
> On Wed, Oct 27, 2021 at 05:19:19PM +0300, Sergey Kandaurov wrote:
>
>>> On 11 Oct 2021, at 21:58, Maxim Dounin <mdounin at mdounin.ru> wrote:
>>>
>>> # HG changeset patch
>>> # User Maxim Dounin <mdounin at mdounin.ru>
>>> # Date 1633978587 -10800
>>> # Mon Oct 11 21:56:27 2021 +0300
>>> # Node ID 489323e194e4c3b1a7937c51bd4e1671c70f52f8
>>> # Parent d175cd09ac9d2bab7f7226eac3bfce196a296cc0
>>> Simplified sendfile_max_chunk handling.
>>>
>>> Previously, it was checked that sendfile_max_chunk was enabled and
>>> almost whole sendfile_max_chunk was sent (see e67ef50c3176), to avoid
>>> delaying connections where sendfile_max_chunk wasn't reached (for example,
>>> when sending responses smaller than sendfile_max_chunk). Now we instead
>>> check if there are unsent data, and the connection is still ready for writing.
>>> Additionally we also check c->write->delayed to ignore connections already
>>> delayed by limit_rate.
>>>
>>> This approach is believed to be more robust, and correctly handles
>>> not only sendfile_max_chunk, but also internal limits of c->send_chain(),
>>> such as sendfile() maximum supported length (ticket #1870).
>>>
>>> diff --git a/src/http/ngx_http_write_filter_module.c b/src/http/ngx_http_write_filter_module.c
>>> --- a/src/http/ngx_http_write_filter_module.c
>>> +++ b/src/http/ngx_http_write_filter_module.c
>>> @@ -321,16 +321,12 @@ ngx_http_write_filter(ngx_http_request_t
>>> delay = (ngx_msec_t) ((nsent - sent) * 1000 / r->limit_rate);
>>>
>>> if (delay > 0) {
>>> - limit = 0;
>>> c->write->delayed = 1;
>>> ngx_add_timer(c->write, delay);
>>> }
>>> }
>>>
>>> - if (limit
>>> - && c->write->ready
>>> - && c->sent - sent >= limit - (off_t) (2 * ngx_pagesize))
>>> - {
>>> + if (chain && c->write->ready && !c->write->delayed) {
>>> ngx_post_event(c->write, &ngx_posted_next_events);
>>> }
>>>
>>
>> Looks good.
>>
>> Not strictly related to this change, so FYI. I noticed a stray writev()
>> after Linux sendfile(), when it writes more than its internal limits.
>>
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 write old buf t:0 f:1 0000000000000000,
>> pos 0000000000000000, size: 0 file: 416072437, size: 3878894859
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter: l:1 f:0 s:3878894859
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter limit 0
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: @416072437 2147482891
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 sendfile: 2147479552 of 2147482891 @416072437
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 writev: 0 of 0
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 http write filter 0000561528695820
>> 2021/10/27 12:44:34 [debug] 1462058#0: *1 post event 00005615289C2310
>>
>> Here sendfile() partially sent 2147479552, which is above its internal
>> limit NGX_SENDFILE_MAXSIZE - ngx_pagesize. On the second iteration,
>> due to this, it falls back to writev() with zero-size headers.
>> Then, with the patch applied, it posts the next write event, as designed
>> (previously, it would seemingly stuck instead, such as in ticket #1870).
>
> Interesting.
>
> Overall it looks harmless, but we may want to look further why
> sendfile() only sent 2147479552 instead of 2147482891. It seems
> that 2147479552 is in pages (524287 x 4096) despite the fact that
> the initial offset is not page-aligned. We expect sendfile() to
> send page-aligned ranges instead (416072437 + 2147482891 == 625868 x 4096).
>
> Looking into Linux sendfile() manpage suggests that 2,147,479,552
> is a documented limit:
>
> sendfile() will transfer at most 0x7ffff000 (2,147,479,552)
> bytes, returning the number of bytes actually transferred.
> (This is true on both 32-bit and 64-bit systems.)
>
> This seems to be mostly arbitrary limitation appeared in Linux
> kernel 2.6.16
> (https://github.com/torvalds/linux/commit/e28cc71572da38a5a12c1cfe4d7032017adccf69).
>
> Interesting enough, the actual limitation is not 0x7ffff000 as
> documented, but instead MAX_RW_COUNT, which is defined as
> (INT_MAX & PAGE_MASK). This suggests that the behaviour will be
> actually different on platforms with larger pages.
>
> Something as simple as:
>
> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> --- a/src/os/unix/ngx_linux_sendfile_chain.c
> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
> */
>
> send = prev_send + sent;
> - continue;
> }
>
> if (send >= limit || in == NULL) {
>
> should be enough to resolve this additional 0-sized writev().
> Untested though, as I don't have a test playground on hand where
> 2G sendfile() can be reached. It would be great if you'll test
> it.
>
That seems to help:
2021/10/27 20:36:31 [debug] 1498568#0: *1 write old buf t:1 f:0 000055D8D328FDB0,
pos 000055D8D328FDB0, size: 252 file: 0, size: 0
2021/10/27 20:36:31 [debug] 1498568#0: *1 write new buf t:0 f:1 0000000000000000,
pos 0000000000000000, size: 0 file: 0, size: 4294967296
2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter: l:1 f:0 s:4294967548
2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter limit 0
2021/10/27 20:36:31 [debug] 1498568#0: *1 writev: 252 of 252
[.. next ngx_linux_sendfile_chain() loop iteration ..]
2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: @0 2147479552
2021/10/27 20:36:31 [debug] 1498568#0: *1 sendfile: 2147479552 of 2147479552 @0
[.. return from ngx_linux_sendfile_chain() on exceeded limit ..]
2021/10/27 20:36:31 [debug] 1498568#0: *1 http write filter 000055D8D329C8D0
2021/10/27 20:36:31 [debug] 1498568#0: *1 post event 000055D8D35CC660
> Full patch:
>
> # HG changeset patch
> # User Maxim Dounin <mdounin at mdounin.ru>
> # Date 1635361800 -10800
> # Wed Oct 27 22:10:00 2021 +0300
> # Node ID 859447c7b7076b676a3421597514b324b708658d
> # Parent 2a7155733855d1c2ea1c1ded8d1a4ba654b533cb
> Fixed sendfile() limit handling on Linux.
>
> On Linux starting with 2.6.16, sendfile() silently limits all operations
> to MAX_RW_COUNT, defined as (INT_MAX & PAGE_MASK). This incorrectly
> triggered the interrupt check, and resulted in 0-sized writev() on the
> next loop iteration.
>
> Fix is to make sure the limit is always checked, so we will return from
> the loop if the limit is already reached even if number of bytes sent is
> not exactly equal to the number of bytes we've tried to send.
>
> diff --git a/src/os/unix/ngx_linux_sendfile_chain.c b/src/os/unix/ngx_linux_sendfile_chain.c
> --- a/src/os/unix/ngx_linux_sendfile_chain.c
> +++ b/src/os/unix/ngx_linux_sendfile_chain.c
> @@ -216,7 +216,6 @@ ngx_linux_sendfile_chain(ngx_connection_
> */
>
> send = prev_send + sent;
> - continue;
> }
>
> if (send >= limit || in == NULL) {
>
The change looks good to me.
Btw, this should also stop exceeding the limit after several sendfile()
calls each interrupted, on Linux 4.3+ (which is rather theoretical).
It probably deserves updating comments in this file about the count
parameter constraints.
--
Sergey Kandaurov
More information about the nginx-devel
mailing list