[PATCH 0 of 1] Fix for Nginx hanging on systems without EPOLLRDHUP

Fri Mar 11 19:59:08 UTC 2022

Hello!

On Mon, Mar 07, 2022 at 06:12:41PM -0500, Marcus Ball wrote:

> On 3/5/22 23:05, Maxim Dounin wrote:
> > Hello!
> >
> > On Thu, Mar 03, 2022 at 02:04:21PM -0500, Marcus Ball wrote:
> >
> >> I recently encountered an issue where Nginx would hang for a very long
> >> time, if not indefinitely, on responses which exceeded the FastCGI
> >> buffer size (> ~4000 bytes) from an upstream source which, in this case,
> >> was PHP-FPM. This issue appeared to only be happening on DigitalOcean's
> >> App Platform service; I couldn't reproduce it locally. I did a lot of
> >> testing and digging around, I eventually tracked it back to
> >> DigitalOcean's system not supporting the `EPOLLRDHUP` event. After much
> >> debugging and reading through Nginx's source code, I believe I found the
> >> source to be two conditions which were missing a check for
> >> `ngx_use_epoll_rdhup`. I made the changes and rebuilt nginx and
> >> everything appears to be working fine now.
> >>
> >> If anyone needs to reproduce the issue, I've published a basic example
> >> athttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarcusball%2Fnginx-epoll-bug&data=04%7C01%7C%7Cfde67f80c99f4344ac4108d9ff26ba57%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637821364072127312%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=JknoAPGVXqPEaACMneRDZxH%2FkHjA4xBmONgvh%2BLpbqE%3D&reserved=0. There are also
> >> corresponding Docker Hub images which should be able to demonstrate an
> >> example project with the bug and with the fix if they are deployed to
> >> App Platform: `marcusball/nginx-rdhup-bug:without-fix` and
> >> `marcusball/nginx-rdhup-bug:with-fix` respectively.
> > Thanks for the investigation.
> >
> > The rev->available shouldn't be 0 unless it was set to 0 due to
> > reading more than (or equal to) the amount of data reported via
> > ioctl(FIONREAD) during the particular event loop iteration.  And
> > it will be again set to -1 as long as an additional event is
> > reported on the socket.  That is, it shouldn't hang when epoll()
> > is working properly and reports all data additionally received
> > after all the data available at the time of the previous
> > epoll_wait() return were read by nginx.
> >
> > I suspect this doesn't work due to issues with DigitalOcean's App
> > Platform's / gVisor's epoll() emulation layer.  Most likely, it
> > fails to report additional events once nginx reads the amount of
> > data reported by ioctl(FIONREAD).  Or ioctl(FIONREAD) simply
> > reports incorrect amount of data (or just 0).
> >
> > Debug log might be helpful to further investigate what goes on
> > here.  It would be great if you'll provide one for additional
> > analysis.
> >
> > As far as I understand, proper workaround for this would be to
> > compile nginx with --with-cc-opt="-DNGX_HAVE_FIONREAD=0", that is,
> > with ioctl(FIONREAD) explicitly disabled.  Please test if it works
> > for you.
> >
> 
> Hi,
> 
> Thank you for the suggestion. I tried using the
> --with-cc-opt="-DNGX_HAVE_FIONREAD=0" compilation option, and
> that also appeared to fix the issue with the response hanging.
> However, I'm not exactly sure that is the best solution.
> 
> First, this would be less than ideal for this circumstance,
> as needing to compile a custom build would make it a lot more
> difficult to use Nginx within off-the-shelf components,
> such as Heroku buildpacks (heroku/php), which is how I initially
> encountered this problem. As a result t would greatly reduce
> the main selling point of a service like DigitalOcean's App Platform
> which is being able to simply deploy code without needing to worry
> about the configuration or maintenance of the server.
> 
> More importantly, while disabling FIONREAD appears to address the
> issue of the request hanging, I *think* it may be more of a
> workaround. I certainly may be mistaken, but I still believe
> the patch I submitted address an actual bug which should be fixed.
> 
> Basically, if I'm understanding everything properly, I think
> that `rev->available` is working correctly. I'm going to use
> `ngx_readv_chain.c` with FIONREAD enabled as an example:
> 
> As Nginx reads the last of the response from upstream (PHP-FPM)
> `readv` receives the remaining body (line 121), `rev->available`
> is >= 0, so `n` is subtracted and `rev->available = 0` (L# 171).
> The problem is that once `ngx_readv_chain` is called again,
> the condition at line 63 is encountered:
> 
> if (rev->available == 0 && !rev->pending_eof) {
>      return NGX_AGAIN;
> }
> 
> `rev->available` and `rev->pending_eof` are both `0`, so
> `NGX_AGAIN` is returned. This is essentially where the "hang"
> occurs: `rev->available` and `rev->pending_eof` always
> remain zero, and Nginx keeps attempting to read again.
> 
> The reason I'm fairly confident this is a bug is that,
> with the exception of one line in `ngx_kqueue_module` which
> I think is not applicable here, `rev->pending_eof` is
> only set to `1` in exactly one place:
> 
> ngx_epoll_module.c:887:
> 
> 
> #if (NGX_HAVE_EPOLLRDHUP)
>              if (revents & EPOLLRDHUP) {
>                  rev->pending_eof = 1;
>              }
> #endif
> 
> Obviously, this line will never be hit if EPOLLRDHUP is not
> available. This is why `pending_eof` is always zero and Nginx
> will continue to try to read from upstream and never complete
> the response. This is why I think adding the check for
> `ngx_use_epoll_rdhup` to mirror the use of other occurrences of
> `rev->pending_eof` (like `ngx_readv_chain:205`) appears
> to make sense.
> 
> 
> Here are a few different logs, note these are the full logs from
> the containerized application so they also include some
> Nginx access logs, as well as PHP-FPM logs mixed in, but the
> vast majority is the Nginx debug log which can be differentiated
> via the full timestamp.
> 
> WITH THE ISSUE:https://pastebin.com/SKerUPX8
> Also note, this contains two requests: the first succeeds,
> then the second experiences the issue of the response never
> completing; it only "completes" when I killed `curl` and
> terminated the client connection.

Thanks, after looking into debug logs I think you are right.  
Indeed, rev->available can reach 0, stopping further processing, 
and if EOF happens to be already reported along with the last 
event, it is not reported again by epoll(), leading to a timeout.

I was able to reproduce this on a recent Linux with a simple test 
patch which removes EPOLLRDHUP from events, thus emulating older 
kernel versions without EPOLLRDHUP support (and the emulation 
layer in DigitalOcean's App Platform's / gVisor's).  I see the 
following tests failures:

t/access_log.t                           (Wstat: 256 Tests: 21 Failed: 1)
  Failed test:  19
  Non-zero exit status: 1
t/fastcgi.t                              (Wstat: 256 Tests: 10 Failed: 1)
  Failed test:  5
  Non-zero exit status: 1
t/limit_rate.t                           (Wstat: 512 Tests: 9 Failed: 2)
  Failed tests:  6-7
  Non-zero exit status: 2
t/ssi_delayed.t                          (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
t/proxy_unfinished.t                     (Wstat: 512 Tests: 17 Failed: 2)
  Failed tests:  11, 13
  Non-zero exit status: 2
t/stream_tcp_nodelay.t                   (Wstat: 256 Tests: 4 Failed: 1)
  Failed test:  3
  Non-zero exit status: 1

This seems to be an issue introduced in 7583:efd71d49bde0 along 
with ioctl(FIONREAD) support (nginx 1.17.5).  Before the change, 
rev->available was never set to 0 unless ngx_use_epoll_rdhup was 
also set (that is, runtime test for EPOLLRDHUP introduced in 
6536:f7849bfb6d21 succeeded).

With the patch, I still see the following test failures:

t/access_log.t                           (Wstat: 256 Tests: 21 Failed: 1)
  Failed test:  19
  Non-zero exit status: 1
t/limit_rate.t                           (Wstat: 1280 Tests: 9 Failed: 5)
  Failed tests:  3-7
  Non-zero exit status: 5

These failures seems to be due to race conditions in the tests 
though, and to be fixed in tests.

Below is the patch with the commit log updated to reflect the 
above details, please take a look.

# HG changeset patch
# User Marcus Ball <marcus.ball at live.com>
# Date 1647028678 -10800
#      Fri Mar 11 22:57:58 2022 +0300
# Node ID 16762702cd6e949b448f43f3e209102ef60c7c2e
# Parent  aec3b1f8ae0c46a032c1bfcbe2c1d89981064993
Fixed runtime handling of systems without EPOLLRDHUP support.

In 7583:efd71d49bde0 (nginx 1.17.5) along with introduction of the
ioctl(FIONREAD) support proper handling of systems without EPOLLRDHUP
support in the kernel (but with EPOLLRDHUP in headers) was broken.

Before the change, rev->available was never set to 0 unless
ngx_use_epoll_rdhup was also set (that is, runtime test for EPOLLRDHUP
introduced in 6536:f7849bfb6d21 succeeded).  After the change,
rev->available might reach 0 on systems without runtime EPOLLRDHUP
support, stopping further reading in ngx_readv_chain() and ngx_unix_recv().
And, if EOF happened to be already reported along with the last event,
it is not reported again by epoll_wait(), leading to connection hangs
and timeouts on such systems.

This affects Linux kernels before 2.6.17 if nginx was compiled
with newer headers, and, more importantly, emulation layers, such as
DigitalOcean's App Platform's / gVisor's epoll emulation layer.

Fix is to explicitly check ngx_use_epoll_rdhup before the corresponding
rev->pending_eof tests in ngx_readv_chain() and ngx_unix_recv().

diff --git a/src/os/unix/ngx_readv_chain.c b/src/os/unix/ngx_readv_chain.c
--- a/src/os/unix/ngx_readv_chain.c
+++ b/src/os/unix/ngx_readv_chain.c
@@ -55,7 +55,9 @@ ngx_readv_chain(ngx_connection_t *c, ngx
 
 #if (NGX_HAVE_EPOLLRDHUP)
 
-    if (ngx_event_flags & NGX_USE_EPOLL_EVENT) {
+    if ((ngx_event_flags & NGX_USE_EPOLL_EVENT)
+        && ngx_use_epoll_rdhup)
+    {
         ngx_log_debug2(NGX_LOG_DEBUG_EVENT, c->log, 0,
                        "readv: eof:%d, avail:%d",
                        rev->pending_eof, rev->available);
diff --git a/src/os/unix/ngx_recv.c b/src/os/unix/ngx_recv.c
--- a/src/os/unix/ngx_recv.c
+++ b/src/os/unix/ngx_recv.c
@@ -52,7 +52,9 @@ ngx_unix_recv(ngx_connection_t *c, u_cha
 
 #if (NGX_HAVE_EPOLLRDHUP)
 
-    if (ngx_event_flags & NGX_USE_EPOLL_EVENT) {
+    if ((ngx_event_flags & NGX_USE_EPOLL_EVENT)
+        && ngx_use_epoll_rdhup)
+    {
         ngx_log_debug2(NGX_LOG_DEBUG_EVENT, c->log, 0,
                        "recv: eof:%d, avail:%d",
                        rev->pending_eof, rev->available);

-- 
Maxim Dounin
http://mdounin.ru/