Crash (double free or corruption) on trying to proxy to localhost

Tue May 3 15:46:42 MSD 2011

Hello!

On Mon, May 02, 2011 at 04:45:13PM -0700, Stephen Weeks wrote:

> Sure!  I've added it to the post on github.  I've slightly edited the log
> out of paranoia (replacing a customer ID and auth key).
> 
> I can confirm now that I ran this proxy with the upstreamfair module
> overnight and it didn't crash at all.  I can't get it to preferentially
> serve to localhost, though, as it doesn't support 'backup' as a server
> attribute, and doesn't seem to really use the weights, so that's suboptimal.
> 
> Anything else I can add to help troubleshooting this?  Anything you'd like
> from the core dump?

Ok, thank you, it looks like I see the problem.

Allocation for "tried" flags doesn't take into account number of 
backup servers, and if there are more backup servers than normal 
ones (and backup servers are in fact used) - this may cause memory 
corruption.

Please try the attached patch.

Maxim Dounin

> 
> On Mon, May 2, 2011 at 2:04 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> 
> > Hello!
> >
> > On Sun, May 01, 2011 at 05:54:17PM -0700, Stephen Weeks wrote:
> >
> > > I've currently got a pool of systems running nginx proxying to a pool of
> > > systems running apache.  I'm trying to move to running nginx locally on
> > the
> > > apache hosts instead of separate systems to avoid the extra network hop,
> > > make more-efficient use of resources, and enable some future development
> > > (including migrating to running our application on nginx via fastcgi
> > instead
> > > of apache, ideally).  we've currently got some significant architecture
> > > built up around apache, so converting right now is uncomfortable.
> > >
> > > Ideally, I'd like nginx to just serve from localhost, but fail over to
> > the
> > > rest of the pool when localhost in unavailable, so in my upstream I have
> > > every server except for localhost set as 'backup'.  I'm otherwise running
> > > identical configurations of apache and nginx on a single system together
> > as
> > > used in the rest of the two pools.  This works exactly as expected,
> > except
> > > that I get a few crashes of nginx workers every minute.  This only
> > happens
> > > when proxying to the local system.  If I proxy anywhere else, it works
> > > fine.  Other proxies can serve from this system without trouble.  I see
> > this
> > > same behaviour on other hosts when I build them the same way, so it's not
> > an
> > > error with the host.  I see this crash on 0.7.65, 0.8.54, and 1.0.0,
> > running
> > > on Ubuntu 10.04 LTS.  I see this crash whether I'm connecting to
> > 127.0.0.1
> > > or the host's local IP.  I see this crash whether I'm listening on *:80
> > or
> > > <public ip>:80.  I see this crash whether I'm connecting to :80 or
> > running
> > > apache on a different port and connecting to :81.  I see this crash
> > whether
> > > I'm running ubuntu's "nginx-light" configuration, or their "nginx-full"
> > > configuration.  I see no errors logged from apache.
> > >
> > > 1) I'd really love to make this work, so if there's anything else I can
> > try,
> > > any additional debugging information I can give, I'd appreciate it.
> > > 2) Nginx has been very useful to me so far, so I thought you'd appreciate
> > a
> > > bug report.
> > >
> > > Posted on github, I have a problem description, section of a debug log,
> > my
> > > (slightly edited: flattened includes and stripped an IP) nginx.conf, a
> > gdb
> > > backtrace, and some additional information I was asked for when looking
> > for
> > > help on IRC.  This is everything I've been able to come up with that
> > sounds
> > > plausibly relevant.
> > >
> > > https://gist.github.com/1574dbaf3a3dcda920a2
> > >
> > > Any help?
> >
> > Could you please provide:
> >
> > 1. nginx -V output
> >
> > 2. Full debug log for '*60' connection (the one which triggered
> > abort in glibc), the one you provided contatins only last part of
> > the connection in question.  Running grep -F ' 27772#0: *60 ' on
> > original debug log should produce something useable.
> >
> > Maxim Dounin
> >
> > _______________________________________________
> > nginx mailing list
> > nginx at nginx.org
> > http://nginx.org/mailman/listinfo/nginx
> >

> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://nginx.org/mailman/listinfo/nginx

-------------- next part --------------
# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1304422854 -14400
# Node ID b7826a837aaf484462db58d64ec0d060ba8b92e5
# Parent  00d13b6d4ebd225f94a2e2a3afa7dbd3ddfe4ed7
Upstream: properly allocate memory for tried flags.

Previous allocation only took into account number of non-backup servers, and
this caused memory corruption with many backup servers.

See report here:

http://nginx.org/pipermail/nginx/2011-May/026531.html

diff --git a/src/http/ngx_http_upstream_round_robin.c b/src/http/ngx_http_upstream_round_robin.c
--- a/src/http/ngx_http_upstream_round_robin.c
+++ b/src/http/ngx_http_upstream_round_robin.c
@@ -219,13 +219,18 @@ ngx_http_upstream_init_round_robin_peer(
     rrp->peers = us->peer.data;
     rrp->current = 0;
 
-    if (rrp->peers->number <= 8 * sizeof(uintptr_t)) {
+    n = rrp->peers->number;
+
+    if (rrp->peers->next && rrp->peers->next->number > n) {
+        n = rrp->peers->next->number;
+    }
+
+    if (n <= 8 * sizeof(uintptr_t)) {
         rrp->tried = &rrp->data;
         rrp->data = 0;
 
     } else {
-        n = (rrp->peers->number + (8 * sizeof(uintptr_t) - 1))
-                / (8 * sizeof(uintptr_t));
+        n = (n + (8 * sizeof(uintptr_t) - 1)) / (8 * sizeof(uintptr_t));
 
         rrp->tried = ngx_pcalloc(r->pool, n * sizeof(uintptr_t));
         if (rrp->tried == NULL) {