Crash (double free or corruption) on trying to proxy to localhost

Tue May 3 16:00:03 MSD 2011

I should be able to try this out tonight (>12 hours from now).  I'll let you
know how it works.  Thanks for looking into this for me.

On Tue, May 3, 2011 at 4:46 AM, Maxim Dounin <mdounin at mdounin.ru> wrote:

> Hello!
>
> On Mon, May 02, 2011 at 04:45:13PM -0700, Stephen Weeks wrote:
>
> > Sure!  I've added it to the post on github.  I've slightly edited the log
> > out of paranoia (replacing a customer ID and auth key).
> >
> > I can confirm now that I ran this proxy with the upstreamfair module
> > overnight and it didn't crash at all.  I can't get it to preferentially
> > serve to localhost, though, as it doesn't support 'backup' as a server
> > attribute, and doesn't seem to really use the weights, so that's
> suboptimal.
> >
> > Anything else I can add to help troubleshooting this?  Anything you'd
> like
> > from the core dump?
>
> Ok, thank you, it looks like I see the problem.
>
> Allocation for "tried" flags doesn't take into account number of
> backup servers, and if there are more backup servers than normal
> ones (and backup servers are in fact used) - this may cause memory
> corruption.
>
> Please try the attached patch.
>
> Maxim Dounin
>
> >
> > On Mon, May 2, 2011 at 2:04 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> >
> > > Hello!
> > >
> > > On Sun, May 01, 2011 at 05:54:17PM -0700, Stephen Weeks wrote:
> > >
> > > > I've currently got a pool of systems running nginx proxying to a pool
> of
> > > > systems running apache.  I'm trying to move to running nginx locally
> on
> > > the
> > > > apache hosts instead of separate systems to avoid the extra network
> hop,
> > > > make more-efficient use of resources, and enable some future
> development
> > > > (including migrating to running our application on nginx via fastcgi
> > > instead
> > > > of apache, ideally).  we've currently got some significant
> architecture
> > > > built up around apache, so converting right now is uncomfortable.
> > > >
> > > > Ideally, I'd like nginx to just serve from localhost, but fail over
> to
> > > the
> > > > rest of the pool when localhost in unavailable, so in my upstream I
> have
> > > > every server except for localhost set as 'backup'.  I'm otherwise
> running
> > > > identical configurations of apache and nginx on a single system
> together
> > > as
> > > > used in the rest of the two pools.  This works exactly as expected,
> > > except
> > > > that I get a few crashes of nginx workers every minute.  This only
> > > happens
> > > > when proxying to the local system.  If I proxy anywhere else, it
> works
> > > > fine.  Other proxies can serve from this system without trouble.  I
> see
> > > this
> > > > same behaviour on other hosts when I build them the same way, so it's
> not
> > > an
> > > > error with the host.  I see this crash on 0.7.65, 0.8.54, and 1.0.0,
> > > running
> > > > on Ubuntu 10.04 LTS.  I see this crash whether I'm connecting to
> > > 127.0.0.1
> > > > or the host's local IP.  I see this crash whether I'm listening on
> *:80
> > > or
> > > > <public ip>:80.  I see this crash whether I'm connecting to :80 or
> > > running
> > > > apache on a different port and connecting to :81.  I see this crash
> > > whether
> > > > I'm running ubuntu's "nginx-light" configuration, or their
> "nginx-full"
> > > > configuration.  I see no errors logged from apache.
> > > >
> > > > 1) I'd really love to make this work, so if there's anything else I
> can
> > > try,
> > > > any additional debugging information I can give, I'd appreciate it.
> > > > 2) Nginx has been very useful to me so far, so I thought you'd
> appreciate
> > > a
> > > > bug report.
> > > >
> > > > Posted on github, I have a problem description, section of a debug
> log,
> > > my
> > > > (slightly edited: flattened includes and stripped an IP) nginx.conf,
> a
> > > gdb
> > > > backtrace, and some additional information I was asked for when
> looking
> > > for
> > > > help on IRC.  This is everything I've been able to come up with that
> > > sounds
> > > > plausibly relevant.
> > > >
> > > > https://gist.github.com/1574dbaf3a3dcda920a2
> > > >
> > > > Any help?
> > >
> > > Could you please provide:
> > >
> > > 1. nginx -V output
> > >
> > > 2. Full debug log for '*60' connection (the one which triggered
> > > abort in glibc), the one you provided contatins only last part of
> > > the connection in question.  Running grep -F ' 27772#0: *60 ' on
> > > original debug log should produce something useable.
> > >
> > > Maxim Dounin
> > >
> > > _______________________________________________
> > > nginx mailing list
> > > nginx at nginx.org
> > > http://nginx.org/mailman/listinfo/nginx
> > >
>
> > _______________________________________________
> > nginx mailing list
> > nginx at nginx.org
> > http://nginx.org/mailman/listinfo/nginx
>
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://nginx.org/mailman/listinfo/nginx
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://nginx.org/pipermail/nginx/attachments/20110503/1f500fa2/attachment.html>