Socket leak

Mon May 27 05:39:20 UTC 2013

Hi Maxim

I found the rootcause. This was a problem with my plugin. Your explanation
on posted_requests helped a lot in debugging the problem. The issue was, my
plugin for some unavoidable reasons holds reference to the
ngx_http_request_t and calls finalize once it is done or it sees some
error. I didnt call ngx_run_posted_request() like ngx_http_request_handler
does. The actual call to writev happens *after* the request_handler returns
because of which it doesnt see the c->error or the posted request and hence
doesnt clean it up.

I will fix my plugin to go with the normal nginx flow soon, but till then
this fix (calling run_posted after finalize) fixes my problem. I did see
the diff from 1.0.5 -> 1.2.6 and couldnt see what could have caused this.

Thanks again for the help and really helpful reply.

+Fasih

On Fri, May 24, 2013 at 10:19 PM, Fasih <faskiri.devel at gmail.com> wrote:

> Hello
>
> Thanks for the really quick reply. The ngx_http_run_posted_requests
> totally made sense and explained the bit that I was missing.
>
> I get the bug when writev called in the context of a request handler gets
> an error. The repro I had was basically with nginx running on a server
> and client on my laptop over wireless @ work. I am not @ work now and from
> my home connection I am unable to repro this. Will send you the backtrace
> as soon as I get it again.
>
>
> On Fri, May 24, 2013 at 8:24 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
>
>> Hello!
>>
>> On Fri, May 24, 2013 at 07:09:58PM +0530, Fasih wrote:
>>
>> > Hi all
>> >
>> > I have been seeing slow but steady socket leak in nginx ever since I
>> > upgraded from 1.0.5 to 1.2.6. I have my custom module in nginx which I
>> was
>> > sure what was the leak. This is how I went about investigating:
>> > 1. Configure nginx with one worker
>> > 2. strace on the worker process, tracing
>> > read/readv/write/writev/close/shutdown calls
>> > 3. Every now and then, for all the open fds (from ls -l /proc/<pid>/fd),
>> > check the socket that is not available in netstat -pane
>> > 4. What I saw was, the leaking socket always had the last operation as
>> > writev which returned an error.
>> > 5. Increased the nginx log level to info and verified that nginx was
>> > getting ECONNRESET or EPIPE on writev failure. Which was OK.
>> > 6. Traced back in code to see how it is handled, the error translates to
>> > CHAIN_ERROR and eventually causes ngx_http_finalize_request to be
>> called.
>> > This in turn calls ngx_http_terminate_request.
>> >
>> > However, in this function, the request is not terminated if
>> > r->write_event_handler is set. This seems to be set if the request
>> handler
>> > is a user module. I think the rationale for the check is, if there is a
>> > module who is handling the request, dont terminate yet, wait for a write
>> > event on the socket and then terminate it (which is why I thought it is
>> > setting r->write_event_handler to ngx_http_terminate_handler).
>>
>> Rationale is to make sure there are no functions on stack which
>> assume request object is here and will try to access it after
>> we'll free request data.
>>
>> The r->write_event_handler (that is, ngx_http_terminate_handler())
>> is expected to be called by a ngx_http_run_posted_requests() which
>> in turn is called by low-level event handling functions (notably,
>> ngx_http_request_handler()).
>>
>> > I tried to repro this w/ empty_gif_handler however, it sends header and
>> > body in one call to writev which I cant get to fail in my test
>> environment.
>> > To reproduce the bug, if I replace the call to ngx_http_send_response
>> with
>> > ngx_http_send_header and ngx_http_output_filter (as used by
>> ngx_upstream or
>> > other modules which dont have the headers and body together), I could
>> > reproduce the leak. I have a client that sends a request and closes the
>> > socket immediately, nginx sees the error, prints the info log, and then
>> it
>> > doesnt close the socket.
>> >
>> > I have a small patch attached, the fix I did is basically saying that if
>> > there is a connection error, there is no point setting
>> write_event_handler
>> > as there wont be any activity on the socket, so just terminate it
>> > immediately.
>> >
>> > I could be very wrong in the understanding of the code flow. My patch
>> just
>> > fixes this and I am not very sure if this is the right fix. Please let
>> me
>> > know.
>> >
>> > I will try to add a testcase to reproduce this in the nginx test
>> framework.
>>
>> The patch looks wrong, see above.
>>
>> Could you please show a backtrace up to
>> ngx_http_terminate_request() with mr->write_event_handler and
>> c->error set (i.e. where you think leak happens)?
>>
>> You may also want to upgrade to a more recent version, e.g. 1.5.0,
>> to make sure the problem you are facing isn't already fixed.
>>
>> --
>> Maxim Dounin
>> http://nginx.org/en/donation.html
>>
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20130527/6cab5acc/attachment.html>