Crash in mail module during SMTP setup
Maxim Dounin
mdounin at mdounin.ru
Mon Jul 29 18:26:42 UTC 2019
Hello!
On Mon, Jul 29, 2019 at 03:03:09PM +1000, Rob N ★ wrote:
> I'm using the mail module for IMAP/POP/SMTP proxying (at
> Fastmail). Lately (last few weeks) we've heard reports of
> connections dropping (particularly IMAP IDLE) but it wasn't
> until this morning I understood the source of it: nginx is
> segfaulting.
>
> I haven't got a direct reproduction, but I find that if I just
> attach gdb to a running worker, most of the time it will
> eventually fail (sometimes they run to completion).
>
> This is only happening on our production frontends, never in
> development or test, which suggests its related to load. The
> machines aren't under any particular memory or CPU pressure, and
> peak around 150K active simultaneous connections, but these
> crashes don't appear to correlate with daily load variations.
> Probably, this has become more noticeable as the number of
> active connections has increased in the last few weeks (as we've
> taken on more customers).
>
> This was happening with 1.15.8, which we've had running since
> February. I upgraded to 1.17.2 this morning just in case it had
> already been fixed, but it still happens.
>
>
> It's slow gathering crashes, so I only have this one (that I've seen three times):
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000555dbe in ngx_mail_smtp_resolve_addr_handler (ctx=0x4c4a430) at src/mail/ngx_mail_smtp_handler.c:107
> 107 ngx_log_error(NGX_LOG_ERR, c->log, 0,
> (gdb) bt
> #0 0x0000000000555dbe in ngx_mail_smtp_resolve_addr_handler (ctx=0x4c4a430) at src/mail/ngx_mail_smtp_handler.c:107
> #1 0x000000000047f0c4 in ngx_resolver_timeout_handler (ev=0x6cecb30) at src/core/ngx_resolver.c:4047
> #2 0x00000000004879e7 in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94
> #3 0x0000000000485307 in ngx_process_events_and_timers (cycle=0x1fec320) at src/event/ngx_event.c:256
> #4 0x00000000004949f7 in ngx_worker_process_cycle (cycle=0x1fec320, data=0x2) at src/os/unix/ngx_process_cycle.c:757
> #5 0x0000000000491121 in ngx_spawn_process (cycle=0x1fec320, proc=0x494909 <ngx_worker_process_cycle>, data=0x2,
> name=0x78ea93 "worker process", respawn=2) at src/os/unix/ngx_process.c:199
> #6 0x0000000000494499 in ngx_reap_children (cycle=0x1fec320) at src/os/unix/ngx_process_cycle.c:629
> #7 0x000000000049304f in ngx_master_process_cycle (cycle=0x1fec320) at src/os/unix/ngx_process_cycle.c:182
> #8 0x000000000044f962 in main (argc=3, argv=0x7ffe5cfba9b8) at src/core/nginx.c:382
> (gdb) f 107
> #0 0x0000000000000000 in ?? ()
> (gdb) f 0
> #0 0x0000000000555dbe in ngx_mail_smtp_resolve_addr_handler (ctx=0x4c4a430) at src/mail/ngx_mail_smtp_handler.c:107
> 107 ngx_log_error(NGX_LOG_ERR, c->log, 0,
> (gdb) l 107
> 102
> 103 s = ctx->data;
> 104 c = s->connection;
> 105
> 106 if (ctx->state) {
> 107 ngx_log_error(NGX_LOG_ERR, c->log, 0,
> 108 "%V could not be resolved (%i: %s)",
> 109 &c->addr_text, ctx->state,
> 110 ngx_resolver_strerror(ctx->state));
> 111
> (gdb) p ctx
> $1 = (ngx_resolver_ctx_t *) 0x4c4a430
> (gdb) p ctx->state
> $2 = 110
> (gdb) p *ctx
> $3 = {next = 0x0, resolver = 0x2020a60, node = 0x66efd90, ident = -1, state = 110, name = {len = 0, data = 0x0},
> service = {len = 0, data = 0x0}, valid = 0, naddrs = 0, addrs = 0x0, addr = {sockaddr = 0x5dc46f0, socklen = 16,
> name = {len = 0, data = 0x0}, priority = 0, weight = 0}, sin = {sin_family = 0, sin_port = 0, sin_addr = {
> s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, count = 0, nsrvs = 0, srvs = 0x0,
> handler = 0x555d7e <ngx_mail_smtp_resolve_addr_handler>, data = 0x47aeaf0, timeout = 30000, quick = 0, async = 1,
> cancelable = 0, recursion = 0, event = 0x6cecb30}
Looking at "p *c" and "p *s" might be also interesting.
> An earlier one on 1.15.8 looks more like this (I don't have the complete gdb session unfortunately):
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x000000000055427b in ngx_event_add_timer (ev=0x321c980, timer=60000) at src/event/ngx_event_timer.h:80
> 80 src/event/ngx_event_timer.h: No such file or directory.
>
> (gdb) l 80
> 75 ngx_del_timer(ev);
> 76 }
> 77
> 78 ev->timer.key = key;
> 79
> 80 ngx_log_debug3(NGX_LOG_DEBUG_EVENT, ev->log, 0,
> 81 "event timer add: %d: %M:%M",
> 82 ngx_event_ident(ev->data), timer, ev->timer.key);
> 83
> 84 ngx_rbtree_insert(&ngx_event_timer_rbtree, &ev->timer);
>
> (gdb) p ev->data
> $3 = (void *) 0x100
>
>
> I've lost the full trace on that but again, it was during SMTP auth.
>
>
> I don't have a good intuition of how to debug this further. What
> other information could I give you to help figure it out?
Any changes to nginx code and/or additional modules?
Additionally, consider configuring debug logging. Given that it's
slow gathering cores, normal debug logging might not be an option,
though configuring large enough memory buffer might work, see
here:
http://nginx.org/en/docs/debugging_log.html#memory
Note that given the segfault looks related to DNS resolution
timeouts - at least the one from 1.17.2 - and your configuration
uses 30 seconds timeout - memory buffer needs to be big enough to
include at least 30 seconds of debug logs.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list