Issue with AWS NLB and nginx
DreamWerx
dreamwerx at gmail.com
Mon Nov 20 11:31:59 UTC 2017
Hi all,
I was hoping someone might have an idea here.. I have a number of nginx
doing load balancing sitting behind AWS's network load balancers (TCP) -
which seem to only support TCP checks.
Recently a few have stopped working / frozen - they still seem to accept a
tcp connection from the NLB - which leads the health check not to fail.
But they cannot internally process the request and you cannot even ssh into
the machine. A reboot is required and that takes longer than normal.
I think the failure is related to a disk issue since the only error in the
entire logs where regarding the disk. (error logs below)
Ideally if nginx or the O/S fails it would be better if the port just
closed. I've considered writing a small daemon that monitors via http
locally and keeps a port open if everything is ok.
These machines have been running for months now without any issues until
now.
Anyone have an idea?
Thanks!
----
[4161960.544106] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds.
[4161960.551035] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4161960.556118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4161960.562846] INFO: task monit:13224 blocked for more than 120 seconds.
[4161960.567394] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4161960.571120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162080.576076] INFO: task dhclient:696 blocked for more than 120 seconds.
[4162080.579596] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162080.582355] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162080.586470] INFO: task monit:13224 blocked for more than 120 seconds.
[4162080.589847] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162080.592654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162200.596100] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds.
[4162200.599646] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162200.602422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162200.606423] INFO: task dhclient:696 blocked for more than 120 seconds.
[4162200.610118] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162200.613093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162200.617889] INFO: task monit:13224 blocked for more than 120 seconds.
[4162200.621641] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162200.624506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162244.551431] systemd[1]: Failed to start Journal Service.
[4162320.628099] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds.
[4162320.631942] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162320.635012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162320.639647] INFO: task dhclient:696 blocked for more than 120 seconds.
[4162320.643241] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162320.646233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162320.650712] INFO: task monit:13224 blocked for more than 120 seconds.
[4162320.654190] Not tainted 4.4.0-1022-aws #31-Ubuntu
[4162320.657183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[4162334.801390] systemd[1]: Failed to start Journal Service.
[4162425.051503] systemd[1]: Failed to start Journal Service.
[4162515.301393] systemd[1]: Failed to start Journal Service.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20171120/21f1b05b/attachment.html>
More information about the nginx
mailing list