Leaky NGINX Plugin Advice

Roman Arutyunyan arut at nginx.com
Thu Apr 25 14:41:07 UTC 2024


Hello,

As this is a development-related question, a better list for it is nginx-devel at nginx.org <mailto:nginx-devel at nginx.org>.

> On 23 Apr 2024, at 1:40 PM, Alex Hussein-Kershaw (HE/HIM) via nginx <nginx at nginx.org> wrote:
> 
> Hi Folks,
> 
> I've inherited an nginx plugin, written against 0.7.69 that has recently been moved to use nginx 1.24.0 to resolve the need to ship old versions of openssl. 
> 
> I've found during performance testing that it's leaking file descriptors. After a few hours running and leaking I hit my configured limit of 100k worker_connections which gets written to logs, and nginx starts "reusing connections".
> 
> The leaked file descriptors don't show up in the output of "ss", they look like this in lsof:
> 
> $ /usr/bin/lsof -p 2875952  | grep protocol  | head -2
> nginx 2875952 user 8u     sock                0,8       0t0 2222824178 protocol: TCP
> nginx 2875952 user 19u     sock                0,8       0t0 2266802646 protocol: TCP
> 
> Googling suggests this may be a socket that has been created but never had a "bind" or "connect" call. I've combed through our plugin code, and am confident it's not responsible for making and leaking these sockets. 
> 
> I should flag two stinkers which may be responsible:
> We have "lingering_timeout" set to an hour, a hack to allow long poll / COMET requests to not be torn down before responding. Stopping load and waiting for an hour does drop some of these leaked fds, but not all. After leaking 17k fds, I stopped my load test and saw it drop to 7k fds which appeared to remain indefinitely. Is this a terrible idea? 
> Within our plugin, we are incrementing the request count field for the same purpose. I'm not really sure why we need both of these, maybe I'm missing something but I can't get COMET polls to work without. I believe that was inspired by Nchan which does something similar. Should I be able to avoid requests getting torn down via this method without lingering_timeout? 
> 
> What could be responsible for these leaked file descriptors and worker connections? I'm unexperienced with nginx so any pointers of where to look are greatly appreciated. 

Incrementing request counter should be done carefully and can lead to socket leaks.

To investigate the issue deeper, you can enable debug logging in nginx and find the leaked socket there by "fd:" prefix.
Then track the leaked connection by its connection number (prefixed with '*' in log).

> 
> Many thanks,
> Alex
> 
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org <mailto:nginx at nginx.org>
> https://mailman.nginx.org/mailman/listinfo/nginx

----
Roman Arutyunyan
arut at nginx.com




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20240425/83278d27/attachment.htm>


More information about the nginx-devel mailing list