[PATCH 0 of 1] Fix for Nginx hanging on systems without EPOLLRDHUP

Maxim Dounin mdounin at mdounin.ru
Sun Mar 6 04:05:46 UTC 2022


Hello!

On Thu, Mar 03, 2022 at 02:04:21PM -0500, Marcus Ball wrote:

> I recently encountered an issue where Nginx would hang for a very long 
> time, if not indefinitely, on responses which exceeded the FastCGI 
> buffer size (> ~4000 bytes) from an upstream source which, in this case, 
> was PHP-FPM. This issue appeared to only be happening on DigitalOcean's 
> App Platform service; I couldn't reproduce it locally. I did a lot of 
> testing and digging around, I eventually tracked it back to 
> DigitalOcean's system not supporting the `EPOLLRDHUP` event. After much 
> debugging and reading through Nginx's source code, I believe I found the 
> source to be two conditions which were missing a check for 
> `ngx_use_epoll_rdhup`. I made the changes and rebuilt nginx and 
> everything appears to be working fine now.
> 
> If anyone needs to reproduce the issue, I've published a basic example 
> at https://github.com/marcusball/nginx-epoll-bug. There are also 
> corresponding Docker Hub images which should be able to demonstrate an 
> example project with the bug and with the fix if they are deployed to 
> App Platform: `marcusball/nginx-rdhup-bug:without-fix` and 
> `marcusball/nginx-rdhup-bug:with-fix` respectively.

Thanks for the investigation.

The rev->available shouldn't be 0 unless it was set to 0 due to 
reading more than (or equal to) the amount of data reported via 
ioctl(FIONREAD) during the particular event loop iteration.  And 
it will be again set to -1 as long as an additional event is 
reported on the socket.  That is, it shouldn't hang when epoll() 
is working properly and reports all data additionally received 
after all the data available at the time of the previous 
epoll_wait() return were read by nginx.

I suspect this doesn't work due to issues with DigitalOcean's App 
Platform's / gVisor's epoll() emulation layer.  Most likely, it 
fails to report additional events once nginx reads the amount of 
data reported by ioctl(FIONREAD).  Or ioctl(FIONREAD) simply 
reports incorrect amount of data (or just 0).

Debug log might be helpful to further investigate what goes on 
here.  It would be great if you'll provide one for additional 
analysis.

As far as I understand, proper workaround for this would be to 
compile nginx with --with-cc-opt="-DNGX_HAVE_FIONREAD=0", that is, 
with ioctl(FIONREAD) explicitly disabled.  Please test if it works 
for you.

-- 
Maxim Dounin
http://mdounin.ru/



More information about the nginx-devel mailing list