Segfault for loopback resolver (CONFIRMED)

Wed Sep 8 08:25:25 MSD 2010

Hi,

Not only loopback resolver address, but any addresses without UDP:53 
port listened can trigger this segfault. I've confirmed the problem on 
my Linux box. Here's the backtrace of nginx-0.8.49 when segfault happens:

#0  0x08059acd in ngx_log_error_core (level=4, log=0x810c16c, err=111, 
fmt=0x80e92cd "recv() failed") at src/core/ngx_log.c:93
#1  0x08062442 in ngx_connection_error (c=0x812c538, err=111, 
text=0x80e92cd "recv() failed") at src/core/ngx_connection.c:1015
#2  0x0806ef06 in ngx_udp_unix_recv (c=0x812c538, buf=0xbfffdcfc 
"m\220", size=4096) at src/os/unix/ngx_udp_recv.c:99
#3  0x080689b1 in ngx_resolver_read_response (rev=0x81424f8) at 
src/core/ngx_resolver.c:952
#4  0x08073035 in ngx_epoll_process_events (cycle=0x810d528, timer=5000, 
flags=<value optimized out>) at src/event/modules/ngx_epoll_module.c:642
#5  0x0806baa2 in ngx_process_events_and_timers (cycle=0x810d528) at 
src/event/ngx_event.c:245
#6  0x08070edc in ngx_single_process_cycle (cycle=0x810d528) at 
src/os/unix/ngx_process_cycle.c:306
#7  0x08059576 in main (argc=1, argv=0xbfffef74) at src/core/nginx.c:398

The bug exists in all versions later than nginx-0.8.36 but not in 
nginx-0.7.x versions.

This problem is due to the incorrectly copy of cycle->new_log when 
ngx_resolver_create() initializing udp_connection->log. Because 
cycle->new_log only gets initialized in ngx_init_cycle() after all 
configurations have been processed, the ngx_resolver_create() will be 
called BEFORE cycle->new_log has anything meaningful. And because it 
COPIED cycle->new_log, udp_connection->log will always be invalid even 
cycle->new_log does get initialized properly later. When resolver is 
used to resolve names and failed to connect the specified nameserver, 
nginx tries to log a timeout error using the invalid log structure 
stored in udp_connection, and then boom...

IMHO, by changing the definition of ngx_udp_connection_t to make log 
field a pointer instead of a ngx_log_t structure (also with a bunch of 
related reference modifications, of course), this bug will gone without 
pains.

The attached patch should fix the bug.

Maxim Dounin wrote:
> Hello!
>
> On Sun, Mar 21, 2010 at 04:21:38AM +0200, Marcus Clyne wrote:
>
>   
>> Hi,
>>
>> With a standard (i.e. no options) installation of 0.8.34 on my Linux
>> machine I get a segfault if the resolver named in the conf file is a
>> loopback address/IP, but the resolver does not exist.  Other
>> non-existing resolvers don't cause a problem (they just hang, and
>> probably will time out), only loopback ones.
>>
>> The debug log is :
>>
>>     
>
> [...]
>
>   
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http cleanup add: 0000000001A44E60
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http finalize request: -4, "/?" a:1, c:2
>>     
>
> Could you please provide full debug log (i.e. switched on at global level, not
> http/server/location)?  E.g. between the above lines should be some valuable
> resolving information which isn't logged in request context but in global one
> instead.
>
> It should look like this:
>
> ...
> 2010/03/25 04:53:49 [debug] 86639#0: *1 http cleanup add: 0811F8F4
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 08126580:68
> 2010/03/25 04:53:49 [debug] 86639#0: resolve: "google.com"
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810D580:60
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 08133460:10
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810AD00:28
> 2010/03/25 04:53:49 [debug] 86639#0: resolve: "google.com" 12007
> 2010/03/25 04:53:49 [debug] 86639#0: UDP socket 11
> 2010/03/25 04:53:49 [debug] 86639#0: connect to 127.0.0.1:53, fd:11 #2
> 2010/03/25 04:53:49 [debug] 86639#0: kevent set event: 11: ft:-1 fl:0025
> 2010/03/25 04:53:49 [debug] 86639#0: send: fd:11 28 of 28
> 2010/03/25 04:53:49 [debug] 86639#0: malloc: 0810D640:60
> 2010/03/25 04:53:49 [debug] 86639#0: event timer add: -1: 30000:2466707655
> 2010/03/25 04:53:49 [debug] 86639#0: event timer add: -1: 5000:2466682655
> 2010/03/25 04:53:49 [debug] 86639#0: *1 http finalize request: -4, "/?" a:1, c:2
> ...
>
>   
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http request count:2 blk:0
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http run request: "/?"
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http upstream check client,
>> write event:1, "/"
>> 2010/03/21 04:07:30 [debug] 4287#0: *3 http upstream recv(): -1 (11:
>> Resource temporarily unavailable)
>> 2010/03/21 04:07:30 [alert] 4275#0: worker process 4287 exited on signal 11
>>
>>
>> I've not got time to look into and write a patch for it right now,
>> but will do if no-one gets around to it at some point.
>>     
>
> Config and backtrace should be helpfull too.  Unfortunately I'm not able to
> reproduce the problem.
>
> Maxim Dounin
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://nginx.org/mailman/listinfo/nginx
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix_resolver.diff
Type: text/x-diff
Size: 2034 bytes
Desc: not available
URL: <http://nginx.org/pipermail/nginx/attachments/20100908/fda8cd2c/attachment.diff>