[PATCH 0 of 1] Fix for Nginx hanging on systems without EPOLLRDHUP

Wed Mar 16 19:08:06 UTC 2022

Hi Maxim,

Thank you for taking the time to look further into this and confirm the 
issue. The new patch details look great, definitely a better description 
than I had. Is there anything else needed from me, or any other help I 
can offer, to try to get this patch merged?

On 3/11/22 14:59, Maxim Dounin wrote:
> Hello!
>
> On Mon, Mar 07, 2022 at 06:12:41PM -0500, Marcus Ball wrote:
>
>> On 3/5/22 23:05, Maxim Dounin wrote:
>>> Hello!
>>>
>>> On Thu, Mar 03, 2022 at 02:04:21PM -0500, Marcus Ball wrote:
>>>
>>>> I recently encountered an issue where Nginx would hang for a very long
>>>> time, if not indefinitely, on responses which exceeded the FastCGI
>>>> buffer size (> ~4000 bytes) from an upstream source which, in this case,
>>>> was PHP-FPM. This issue appeared to only be happening on DigitalOcean's
>>>> App Platform service; I couldn't reproduce it locally. I did a lot of
>>>> testing and digging around, I eventually tracked it back to
>>>> DigitalOcean's system not supporting the `EPOLLRDHUP` event. After much
>>>> debugging and reading through Nginx's source code, I believe I found the
>>>> source to be two conditions which were missing a check for
>>>> `ngx_use_epoll_rdhup`. I made the changes and rebuilt nginx and
>>>> everything appears to be working fine now.
>>>>
>>>> If anyone needs to reproduce the issue, I've published a basic example
>>>> athttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarcusball%2Fnginx-epoll-bug&data=04%7C01%7C%7Cdfed9f2f9bf2450186ff08da0399c1cf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637826256184318799%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=s83YDr%2BcT82or4uvSxAbIo6u9NuLo2GSIA8mmaHnZK4%3D&reserved=0. There are also
>>>> corresponding Docker Hub images which should be able to demonstrate an
>>>> example project with the bug and with the fix if they are deployed to
>>>> App Platform: `marcusball/nginx-rdhup-bug:without-fix` and
>>>> `marcusball/nginx-rdhup-bug:with-fix` respectively.
>>> Thanks for the investigation.
>>>
>>> The rev->available shouldn't be 0 unless it was set to 0 due to
>>> reading more than (or equal to) the amount of data reported via
>>> ioctl(FIONREAD) during the particular event loop iteration.  And
>>> it will be again set to -1 as long as an additional event is
>>> reported on the socket.  That is, it shouldn't hang when epoll()
>>> is working properly and reports all data additionally received
>>> after all the data available at the time of the previous
>>> epoll_wait() return were read by nginx.
>>>
>>> I suspect this doesn't work due to issues with DigitalOcean's App
>>> Platform's / gVisor's epoll() emulation layer.  Most likely, it
>>> fails to report additional events once nginx reads the amount of
>>> data reported by ioctl(FIONREAD).  Or ioctl(FIONREAD) simply
>>> reports incorrect amount of data (or just 0).
>>>
>>> Debug log might be helpful to further investigate what goes on
>>> here.  It would be great if you'll provide one for additional
>>> analysis.
>>>
>>> As far as I understand, proper workaround for this would be to
>>> compile nginx with --with-cc-opt="-DNGX_HAVE_FIONREAD=0", that is,
>>> with ioctl(FIONREAD) explicitly disabled.  Please test if it works
>>> for you.
>>>
>> Hi,
>>
>> Thank you for the suggestion. I tried using the
>> --with-cc-opt="-DNGX_HAVE_FIONREAD=0" compilation option, and
>> that also appeared to fix the issue with the response hanging.
>> However, I'm not exactly sure that is the best solution.
>>
>> First, this would be less than ideal for this circumstance,
>> as needing to compile a custom build would make it a lot more
>> difficult to use Nginx within off-the-shelf components,
>> such as Heroku buildpacks (heroku/php), which is how I initially
>> encountered this problem. As a result t would greatly reduce
>> the main selling point of a service like DigitalOcean's App Platform
>> which is being able to simply deploy code without needing to worry
>> about the configuration or maintenance of the server.
>>
>> More importantly, while disabling FIONREAD appears to address the
>> issue of the request hanging, I *think* it may be more of a
>> workaround. I certainly may be mistaken, but I still believe
>> the patch I submitted address an actual bug which should be fixed.
>>
>> Basically, if I'm understanding everything properly, I think
>> that `rev->available` is working correctly. I'm going to use
>> `ngx_readv_chain.c` with FIONREAD enabled as an example:
>>
>> As Nginx reads the last of the response from upstream (PHP-FPM)
>> `readv` receives the remaining body (line 121), `rev->available`
>> is >= 0, so `n` is subtracted and `rev->available = 0` (L# 171).
>> The problem is that once `ngx_readv_chain` is called again,
>> the condition at line 63 is encountered:
>>
>> if (rev->available == 0 && !rev->pending_eof) {
>>       return NGX_AGAIN;
>> }
>>
>> `rev->available` and `rev->pending_eof` are both `0`, so
>> `NGX_AGAIN` is returned. This is essentially where the "hang"
>> occurs: `rev->available` and `rev->pending_eof` always
>> remain zero, and Nginx keeps attempting to read again.
>>
>> The reason I'm fairly confident this is a bug is that,
>> with the exception of one line in `ngx_kqueue_module` which
>> I think is not applicable here, `rev->pending_eof` is
>> only set to `1` in exactly one place:
>>
>> ngx_epoll_module.c:887:
>>
>>
>> #if (NGX_HAVE_EPOLLRDHUP)
>>               if (revents & EPOLLRDHUP) {
>>                   rev->pending_eof = 1;
>>               }
>> #endif
>>
>> Obviously, this line will never be hit if EPOLLRDHUP is not
>> available. This is why `pending_eof` is always zero and Nginx
>> will continue to try to read from upstream and never complete
>> the response. This is why I think adding the check for
>> `ngx_use_epoll_rdhup` to mirror the use of other occurrences of
>> `rev->pending_eof` (like `ngx_readv_chain:205`) appears
>> to make sense.
>>
>>
>> Here are a few different logs, note these are the full logs from
>> the containerized application so they also include some
>> Nginx access logs, as well as PHP-FPM logs mixed in, but the
>> vast majority is the Nginx debug log which can be differentiated
>> via the full timestamp.
>>
>> WITH THE ISSUE:https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FSKerUPX8&data=04%7C01%7C%7Cdfed9f2f9bf2450186ff08da0399c1cf%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637826256184318799%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1D2r6YNz4pLmy8xMw%2BZtjQbKXf714r83OC08VEI%2F6ik%3D&reserved=0
>> Also note, this contains two requests: the first succeeds,
>> then the second experiences the issue of the response never
>> completing; it only "completes" when I killed `curl` and
>> terminated the client connection.
> Thanks, after looking into debug logs I think you are right.
> Indeed, rev->available can reach 0, stopping further processing,
> and if EOF happens to be already reported along with the last
> event, it is not reported again by epoll(), leading to a timeout.
>
> I was able to reproduce this on a recent Linux with a simple test
> patch which removes EPOLLRDHUP from events, thus emulating older
> kernel versions without EPOLLRDHUP support (and the emulation
> layer in DigitalOcean's App Platform's / gVisor's).  I see the
> following tests failures:
>
> t/access_log.t                           (Wstat: 256 Tests: 21 Failed: 1)
>    Failed test:  19
>    Non-zero exit status: 1
> t/fastcgi.t                              (Wstat: 256 Tests: 10 Failed: 1)
>    Failed test:  5
>    Non-zero exit status: 1
> t/limit_rate.t                           (Wstat: 512 Tests: 9 Failed: 2)
>    Failed tests:  6-7
>    Non-zero exit status: 2
> t/ssi_delayed.t                          (Wstat: 256 Tests: 3 Failed: 1)
>    Failed test:  1
>    Non-zero exit status: 1
> t/proxy_unfinished.t                     (Wstat: 512 Tests: 17 Failed: 2)
>    Failed tests:  11, 13
>    Non-zero exit status: 2
> t/stream_tcp_nodelay.t                   (Wstat: 256 Tests: 4 Failed: 1)
>    Failed test:  3
>    Non-zero exit status: 1
>
> This seems to be an issue introduced in 7583:efd71d49bde0 along
> with ioctl(FIONREAD) support (nginx 1.17.5).  Before the change,
> rev->available was never set to 0 unless ngx_use_epoll_rdhup was
> also set (that is, runtime test for EPOLLRDHUP introduced in
> 6536:f7849bfb6d21 succeeded).
>
> With the patch, I still see the following test failures:
>
> t/access_log.t                           (Wstat: 256 Tests: 21 Failed: 1)
>    Failed test:  19
>    Non-zero exit status: 1
> t/limit_rate.t                           (Wstat: 1280 Tests: 9 Failed: 5)
>    Failed tests:  3-7
>    Non-zero exit status: 5
>
> These failures seems to be due to race conditions in the tests
> though, and to be fixed in tests.
>
> Below is the patch with the commit log updated to reflect the
> above details, please take a look.
>
> # HG changeset patch
> # User Marcus Ball <marcus.ball at live.com>
> # Date 1647028678 -10800
> #      Fri Mar 11 22:57:58 2022 +0300
> # Node ID 16762702cd6e949b448f43f3e209102ef60c7c2e
> # Parent  aec3b1f8ae0c46a032c1bfcbe2c1d89981064993
> Fixed runtime handling of systems without EPOLLRDHUP support.
>
> In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the
> ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP
> support in the kernel (but with EPOLLRDHUP in headers) was broken.
>
> Before the change, rev->available was never set to 0 unless
> ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP
> introduced in 6536:f7849bfb6d21 succeeded).  After the change,
> rev->available might reach 0 on systems without runtime EPOLLRDHUP
> support, stopping further reading in ngx_readv_chain() and ngx_unix_recv().
> And, if EOF happened to be already reported along with the last event,
> it is not reported again by epoll_wait(), leading to connection hangs
> and timeouts on such systems.
>
> This affects Linux kernels before 2.6.17 if nginx was compiled
> with newer headers, and, more importantly, emulation layers, such as
> DigitalOcean's App Platform's / gVisor's epoll emulation layer.
>
> Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding
> rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().
>
> diff --git a/src/os/unix/ngx_readv_chain.c b/src/os/unix/ngx_readv_chain.c
> --- a/src/os/unix/ngx_readv_chain.c
> +++ b/src/os/unix/ngx_readv_chain.c
> @@ -55,7 +55,9 @@ ngx_readv_chain(ngx_connection_t *c, ngx
>   
>   #if (NGX_HAVE_EPOLLRDHUP)
>   
> -    if (ngx_event_flags & NGX_USE_EPOLL_EVENT) {
> +    if ((ngx_event_flags & NGX_USE_EPOLL_EVENT)
> +        && ngx_use_epoll_rdhup)
> +    {
>           ngx_log_debug2(NGX_LOG_DEBUG_EVENT, c->log, 0,
>                          "readv: eof:%d, avail:%d",
>                          rev->pending_eof, rev->available);
> diff --git a/src/os/unix/ngx_recv.c b/src/os/unix/ngx_recv.c
> --- a/src/os/unix/ngx_recv.c
> +++ b/src/os/unix/ngx_recv.c
> @@ -52,7 +52,9 @@ ngx_unix_recv(ngx_connection_t *c, u_cha
>   
>   #if (NGX_HAVE_EPOLLRDHUP)
>   
> -    if (ngx_event_flags & NGX_USE_EPOLL_EVENT) {
> +    if ((ngx_event_flags & NGX_USE_EPOLL_EVENT)
> +        && ngx_use_epoll_rdhup)
> +    {
>           ngx_log_debug2(NGX_LOG_DEBUG_EVENT, c->log, 0,
>                          "recv: eof:%d, avail:%d",
>                          rev->pending_eof, rev->available);
>