v1.19.5 OOPS: "Main process exited, code=dumped, status=11/SEGV" ?
PGNet Dev
pgnet.dev at gmail.com
Sat Dec 5 22:22:59 UTC 2020
I'm running nginx/1.19.5 on a Fedora32 VM, w/
uname -rm
5.9.11-100.fc32.x86_64 x86_64
Its run for ages without issues. At least that I'd noticed ...
Today, I caught a SEGV/core-dump; the server stopped
systemctl status nginx
● nginx.service - The nginx HTTP and reverse proxy server
Loaded: loaded (/etc/systemd/system/nginx.service; enabled; vendor preset: disabled)
Active: failed (Result: core-dump) since Sat 2020-12-05 05:58:03 PST; 7h ago
Process: 993 ExecStartPre=/bin/chown -R wwwrun:www /usr/local/etc/nginx (code=exited, status=0/SUCCESS)
Process: 999 ExecStartPre=/usr/local/nginx/sbin/nginx -t -c /usr/local/etc/nginx/nginx.conf -g pid /run/nginx/nginx.pid; (code=exited, status=0/SUCCESS)
Process: 1063 ExecStart=/usr/local/nginx/sbin/nginx -c /usr/local/etc/nginx/nginx.conf -g pid /run/nginx/nginx.pid; (code=exited, status=0/SUCCESS)
Process: 1108 ExecStartPost=/bin/chown -R wwwrun:www /var/log/nginx (code=exited, status=0/SUCCESS)
Process: 25986 ExecReload=/bin/kill -s HUP $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 1103 (code=dumped, signal=SEGV)
CPU: 14.607s
Checking logs (at current production loglevels) for this one, nothing out of the ordinary ...
EXCEPT The last log entry I see,:
/var/log/nginx/main.access.log:61.219.11.153 _ - [05/Dec/2020:05:08:14 -0800] \x01A\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 "400" 150 "-" "-" "-"
Given the proximity of the timestamp, I'g guess it's related?
I haven't yet figured out where/how to grab the core-dump ; working on that.
Checking history it's happened a few times
xzegrep SEGV /var/log/messages
2020-11-29T05:52:32.436235-08:00 vm0026 systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
2020-12-01T05:39:03.218376-08:00 vm0026 systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
2020-12-03T05:17:51.653637-08:00 vm0026 systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
2020-12-05T05:58:03.611240-08:00 vm0026 systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
where each instance in log looks like,
....
2020-12-05T05:55:00.854490-08:00 vm0026 systemd[25768]: Reached target Shutdown.
2020-12-05T05:55:00.854510-08:00 vm0026 systemd[25768]: systemd-exit.service: Succeeded.
2020-12-05T05:55:00.854531-08:00 vm0026 systemd[25768]: Finished Exit the Session.
2020-12-05T05:55:00.854550-08:00 vm0026 systemd[25768]: Reached target Exit the Session.
2020-12-05T05:55:00.858225-08:00 vm0026 systemd[1]: user at 0.service: Succeeded.
2020-12-05T05:55:00.858322-08:00 vm0026 systemd[1]: Stopped User Manager for UID 0.
2020-12-05T05:55:00.860232-08:00 vm0026 systemd[1]: Stopping User Runtime Directory /run/user/0...
2020-12-05T05:55:00.868288-08:00 vm0026 systemd[1]: run-user-0.mount: Succeeded.
2020-12-05T05:55:00.870265-08:00 vm0026 systemd[1]: user-runtime-dir at 0.service: Succeeded.
2020-12-05T05:55:00.870383-08:00 vm0026 systemd[1]: Stopped User Runtime Directory /run/user/0.
2020-12-05T05:55:00.871216-08:00 vm0026 systemd[1]: Removed slice User Slice of UID 0.
2020-12-05T05:58:03.418222-08:00 vm0026 systemd[1]: Reloading The nginx HTTP and reverse proxy server.
2020-12-05T05:58:03.420214-08:00 vm0026 systemd[1]: Reloaded The nginx HTTP and reverse proxy server.
2020-12-05T05:58:03.432221-08:00 vm0026 systemd[1]: nginx.service: Unit cannot be reloaded because it is inactive.
2020-12-05T05:58:03.432358-08:00 vm0026 systemctl[25987]: nginx.service is not active, cannot reload.
2020-12-05T05:58:03.468235-08:00 vm0026 kernel: nginx[1103]: segfault at 10 ip 00007f5c566d6283 sp 00007ffeebdca500 error 4 in libperl.so.5.30.3[7f5c56668000+16d000]
2020-12-05T05:58:03.468374-08:00 vm0026 kernel: Code: f9 ff 48 89 45 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 90 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c5 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49
2020-12-05T05:58:03.468407-08:00 vm0026 kernel: potentially unexpected fatal signal 11.
2020-12-05T05:58:03.468436-08:00 vm0026 kernel: CPU: 1 PID: 1103 Comm: nginx Not tainted 5.9.11-100.fc32.x86_64 #1
2020-12-05T05:58:03.468460-08:00 vm0026 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
2020-12-05T05:58:03.468483-08:00 vm0026 kernel: RIP: 0033:0x7f5c566d6283
2020-12-05T05:58:03.468504-08:00 vm0026 kernel: Code: f9 ff 48 89 45 10 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 90 0f b6 7f 30 48 c1 e8 03 48 29 f8 48 89 c5 74 89 48 8b 02 <4c> 8b 68 10 4d 85 ed 0f 84 28 01 00 00 0f b6 40 30 49 c1 ed 03 49
2020-12-05T05:58:03.468549-08:00 vm0026 kernel: RSP: 002b:00007ffeebdca500 EFLAGS: 00010202
2020-12-05T05:58:03.468570-08:00 vm0026 kernel: RAX: 0000000000000000 RBX: 00007ffeebdca6c8 RCX: 0000000000000001
2020-12-05T05:58:03.468587-08:00 vm0026 kernel: RDX: 0000556ebd9e9750 RSI: 0000556ebd924b80 RDI: 0000000000000000
2020-12-05T05:58:03.468607-08:00 vm0026 kernel: RBP: 0000000000000597 R08: 00007ffeebdca6c8 R09: 0000000000000011
2020-12-05T05:58:03.468628-08:00 vm0026 kernel: R10: 0000556ebda8c808 R11: 00007ffeebdca570 R12: 0000556ebd99e7e0
2020-12-05T05:58:03.468646-08:00 vm0026 kernel: R13: 00007ffeebdca568 R14: 0000000000000001 R15: 0000000000000001
2020-12-05T05:58:03.468667-08:00 vm0026 kernel: FS: 00007f5c5837bb80 GS: 0000000000000000
!! 2020-12-05T05:58:03.611240-08:00 vm0026 systemd[1]: nginx.service: Main process exited, code=dumped, status=11/SEGV
2020-12-05T05:58:03.645225-08:00 vm0026 systemd[1]: nginx.service: Failed with result 'core-dump'.
2020-12-05T05:58:03.646343-08:00 vm0026 systemd[1]: nginx.service: Consumed 14.607s CPU time.
2020-12-05T06:00:43.703240-08:00 vm0026 systemd[1]: Starting system activity accounting tool...
2020-12-05T06:00:43.712400-08:00 vm0026 systemd[1]: sysstat-collect.service: Succeeded.
2020-12-05T06:00:43.712741-08:00 vm0026 systemd[1]: Finished system activity accounting tool.
2020-12-05T06:01:01.537222-08:00 vm0026 systemd[1]: Created slice User Slice of UID 982.
2020-12-05T06:01:01.538234-08:00 vm0026 systemd[1]: Starting User Runtime Directory /run/user/982...
2020-12-05T06:01:01.550219-08:00 vm0026 systemd[1]: Finished User Runtime Directory /run/user/982.
2020-12-05T06:01:01.552232-08:00 vm0026 systemd[1]: Starting User Manager for UID 982...
2020-12-05T06:01:01.705790-08:00 vm0026 systemd[26023]: Startup finished in 132ms.
....
IIUC, nginx error "400" is 'bad header'.
But I do note,
segfault at 10 ip 00007f5c566d6283 sp 00007ffeebdca500 error 4 in libperl.so.5.30.3[7f5c56668000+16d000]
Not clear what the cause of this infrequent/intermittent issue is.
Apart from blocking that IP/block/country, what's the right way to harden nginx against this^ ? Ideally, preventing the SEGV in the first place.
Or, hints as to what to look at more closely?
More information about the nginx
mailing list