Errors in Unit logs and CPU 100%

Travis Warlick twarlick at maxmedia.com
Wed May 22 19:10:50 UTC 2019


I am seeing the same log entries and have experienced a very similar
consumption problem about once a month.  At the onset of the problem, there
is a short spike in CPU usage, then the CPU usage goes to nearly 0, but the
load-average stays near 2x the number of VCPUs.  After about 20~30 minutes,
the server becomes completely unresponsive, and I'm forced to reboot via
the AWS console.  Some painfully obtained observations indicated that all
available network and file resources were being consumed at this point.
After a couple occurrences, I noticed in AWS CloudWatch metrics that the
EFS volume (Elastic File System, i.e. basically NFS) spiked to 100%
throughput at nearly the same time as the short spike in CPU usage and
stayed there.  My solution was to increase the volume's provisioned
throughput to 10 Mbps, and I have not experienced the issue since, although
I am still seeing the log entries you mention.  My conclusion is that nginx
unit was consuming massive amounts of network and file system resources and
"choking" the server.  This is obviously not a long-term solution, though,
as there's no reason for such a massive amount of EFS volume throughput for
a Wordpress application.


*Travis Warlick*
*Manager, Development Operations*

*maxmedia*
www.maxmedia.com

Find us on: LinkedIn <https://www.linkedin.com/company/maxmedia/> | Twitter
<https://twitter.com/maxmedia_atl> | Facebook
<https://www.facebook.com/MaxMediaATL> | Instagram
<https://www.instagram.com/maxmedia_atl/> | Vimeo <https://vimeo.com/mxm>



On Wed, May 22, 2019 at 11:01 AM Peter TKATCHENKO <peter at bimp.fr> wrote:

> Hello,
>
> I'm using Unit to publish a PHP application (NextCloud). I am on FreeBSD
> 11.2 (jailed), I use PHP 7.2 from packages. There is an NGINX server as
> front-end.
>
> When I check Unit logs I see many records like this:
>
> 2019/05/16 12:44:39 [info] 88085#101952 *73260 shutdown(177, 2) failed
> (57: Socket is not connected)
>
> And sometimes like this:
>
> 2019/05/16 12:53:39 [alert] 88085#101951 *74551 socket close(177) failed
> (54: Connection reset by peer)
>
> The application seems to work correctly, but I would like to understand
> the cause of these errors, probably I need to tune something?
>
> Another problem is more important. Sometimes (once a week) 'unit: router'
> process begins to consume 100% of 4 VCPU (!!), it takes about 15 minutes to
> grow the CPU usage, and finish by blocking completely the application ('bad
> gateway' error on nginx). Restart of unitd service solves the problem. I
> see many errors of the first type in logs at this moment, but nothing more
> interesting.
>
> Best regards,
> Peter TKATCHENKO
> _______________________________________________
> unit mailing list
> unit at nginx.org
> https://mailman.nginx.org/mailman/listinfo/unit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/unit/attachments/20190522/02cc5709/attachment.html>


More information about the unit mailing list