Crash in mail module during SMTP setup

Rob N ★ robn at fastmailteam.com
Mon Jul 29 05:03:09 UTC 2019


I'm using the mail module for IMAP/POP/SMTP proxying (at Fastmail). Lately (last few weeks) we've heard reports of connections dropping (particularly IMAP IDLE) but it wasn't until this morning I understood the source of it: nginx is segfaulting.

I haven't got a direct reproduction, but I find that if I just attach gdb to a running worker, most of the time it will eventually fail (sometimes they run to completion).

This is only happening on our production frontends, never in development or test, which suggests its related to load. The machines aren't under any particular memory or CPU pressure, and peak around 150K active simultaneous connections, but these crashes don't appear to correlate with daily load variations. Probably, this has become more noticeable as the number of active connections has increased in the last few weeks (as we've taken on more customers).

This was happening with 1.15.8, which we've had running since February. I upgraded to 1.17.2 this morning just in case it had already been fixed, but it still happens.


It's slow gathering crashes, so I only have this one (that I've seen three times):

Program received signal SIGSEGV, Segmentation fault.
0x0000000000555dbe in ngx_mail_smtp_resolve_addr_handler (ctx=0x4c4a430) at src/mail/ngx_mail_smtp_handler.c:107
107 ngx_log_error(NGX_LOG_ERR, c->log, 0,
(gdb) bt
#0 0x0000000000555dbe in ngx_mail_smtp_resolve_addr_handler (ctx=0x4c4a430) at src/mail/ngx_mail_smtp_handler.c:107
#1 0x000000000047f0c4 in ngx_resolver_timeout_handler (ev=0x6cecb30) at src/core/ngx_resolver.c:4047
#2 0x00000000004879e7 in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94
#3 0x0000000000485307 in ngx_process_events_and_timers (cycle=0x1fec320) at src/event/ngx_event.c:256
#4 0x00000000004949f7 in ngx_worker_process_cycle (cycle=0x1fec320, data=0x2) at src/os/unix/ngx_process_cycle.c:757
#5 0x0000000000491121 in ngx_spawn_process (cycle=0x1fec320, proc=0x494909 <ngx_worker_process_cycle>, data=0x2,
 name=0x78ea93 "worker process", respawn=2) at src/os/unix/ngx_process.c:199
#6 0x0000000000494499 in ngx_reap_children (cycle=0x1fec320) at src/os/unix/ngx_process_cycle.c:629
#7 0x000000000049304f in ngx_master_process_cycle (cycle=0x1fec320) at src/os/unix/ngx_process_cycle.c:182
#8 0x000000000044f962 in main (argc=3, argv=0x7ffe5cfba9b8) at src/core/nginx.c:382
(gdb) f 107
#0 0x0000000000000000 in ?? ()
(gdb) f 0
#0 0x0000000000555dbe in ngx_mail_smtp_resolve_addr_handler (ctx=0x4c4a430) at src/mail/ngx_mail_smtp_handler.c:107
107 ngx_log_error(NGX_LOG_ERR, c->log, 0,
(gdb) l 107
102
103 s = ctx->data;
104 c = s->connection;
105
106 if (ctx->state) {
107 ngx_log_error(NGX_LOG_ERR, c->log, 0,
108 "%V could not be resolved (%i: %s)",
109 &c->addr_text, ctx->state,
110 ngx_resolver_strerror(ctx->state));
111
(gdb) p ctx
$1 = (ngx_resolver_ctx_t *) 0x4c4a430
(gdb) p ctx->state
$2 = 110
(gdb) p *ctx
$3 = {next = 0x0, resolver = 0x2020a60, node = 0x66efd90, ident = -1, state = 110, name = {len = 0, data = 0x0},
 service = {len = 0, data = 0x0}, valid = 0, naddrs = 0, addrs = 0x0, addr = {sockaddr = 0x5dc46f0, socklen = 16,
 name = {len = 0, data = 0x0}, priority = 0, weight = 0}, sin = {sin_family = 0, sin_port = 0, sin_addr = {
 s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, count = 0, nsrvs = 0, srvs = 0x0,
 handler = 0x555d7e <ngx_mail_smtp_resolve_addr_handler>, data = 0x47aeaf0, timeout = 30000, quick = 0, async = 1,
 cancelable = 0, recursion = 0, event = 0x6cecb30}


An earlier one on 1.15.8 looks more like this (I don't have the complete gdb session unfortunately):

Program received signal SIGSEGV, Segmentation fault.
0x000000000055427b in ngx_event_add_timer (ev=0x321c980, timer=60000) at src/event/ngx_event_timer.h:80
80 src/event/ngx_event_timer.h: No such file or directory.

(gdb) l 80
75 ngx_del_timer(ev);
76 }
77
78 ev->timer.key = key;
79
80 ngx_log_debug3(NGX_LOG_DEBUG_EVENT, ev->log, 0,
81 "event timer add: %d: %M:%M",
82 ngx_event_ident(ev->data), timer, ev->timer.key);
83
84 ngx_rbtree_insert(&ngx_event_timer_rbtree, &ev->timer);

(gdb) p ev->data
$3 = (void *) 0x100


I've lost the full trace on that but again, it was during SMTP auth.


I don't have a good intuition of how to debug this further. What other information could I give you to help figure it out?

Cheers,
Rob N
Fastmail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20190729/a390cab2/attachment.html>


More information about the nginx mailing list