Issue with AWS NLB and nginx

DreamWerx dreamwerx at gmail.com
Mon Nov 20 11:31:59 UTC 2017


Hi all,

I was hoping someone might have an idea here..  I have a number of nginx
doing load balancing sitting behind AWS's network load balancers (TCP) -
which seem to only support TCP checks.

Recently a few have stopped working / frozen - they still seem to accept a
tcp connection from the NLB - which leads the health check not to fail.
But they cannot internally process the request and you cannot even ssh into
the machine.  A reboot is required and that takes longer than normal.

I think the failure is related to a disk issue since the only error in the
entire logs where regarding the disk. (error logs below)

Ideally if nginx or the O/S fails it would be better if the port just
closed.  I've considered writing a small daemon that monitors via http
locally and keeps a port open if everything is ok.

These machines have been running for months now without any issues until
now.

Anyone have an idea?

Thanks!

----

[4161960.544106] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds.

[4161960.551035]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4161960.556118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4161960.562846] INFO: task monit:13224 blocked for more than 120 seconds.

[4161960.567394]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4161960.571120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162080.576076] INFO: task dhclient:696 blocked for more than 120 seconds.

[4162080.579596]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162080.582355] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162080.586470] INFO: task monit:13224 blocked for more than 120 seconds.

[4162080.589847]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162080.592654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162200.596100] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds.

[4162200.599646]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162200.602422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162200.606423] INFO: task dhclient:696 blocked for more than 120 seconds.

[4162200.610118]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162200.613093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162200.617889] INFO: task monit:13224 blocked for more than 120 seconds.

[4162200.621641]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162200.624506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162244.551431] systemd[1]: Failed to start Journal Service.

[4162320.628099] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds.

[4162320.631942]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162320.635012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162320.639647] INFO: task dhclient:696 blocked for more than 120 seconds.

[4162320.643241]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162320.646233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162320.650712] INFO: task monit:13224 blocked for more than 120 seconds.

[4162320.654190]       Not tainted 4.4.0-1022-aws #31-Ubuntu

[4162320.657183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

[4162334.801390] systemd[1]: Failed to start Journal Service.

[4162425.051503] systemd[1]: Failed to start Journal Service.

[4162515.301393] systemd[1]: Failed to start Journal Service.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20171120/21f1b05b/attachment.html>


More information about the nginx mailing list