Errors in Unit logs and CPU 100%

Peter TKATCHENKO peter at bimp.fr
Thu May 23 10:23:20 UTC 2019


Hello,

I had the problem of CPU again this morning, it seems that the situation 
degrades :(

I could attach the debugger to get the backtrace:

#0  0x0000000800dfe81a in _kevent () from /lib/libc.so.7
#1  0x0000000800a9bcc2 in ?? () from /lib/libthr.so.3
#2  0x00000000004238c1 in nxt_kqueue_poll (engine=0x801416500, 
timeout=<optimized out>) at src/nxt_kqueue_engine.c:692
#3  0x0000000000412bb0 in nxt_event_engine_start (engine=0x801416500) at 
src/nxt_event_engine.c:549
#4  0x0000000000406c47 in main (argc=<optimized out>, argv=<optimized 
out>) at src/nxt_main.c:35

If I kill the process - a new one is created and takes again 100% of 
CPU. And if I restart the service - it's OK.

Please, help!

*Peter TKATCHENKO | Consultant technique*

04 72 60 39 00 | 0 812 211 211

150 allée des Frènes - 69760 LIMONEST

BIMP Groupe LOGO

*Retrouvez-nous sur _www.bimp.fr <https://www.bimp.fr/>_* *|* 
_*www.bimp-pro.fr <http://www.bimp-pro.fr/>*_ *|* 
_*www.bimp-education.fr <http://www.bimp-education.fr/>*_

Ce message et éventuellement les pièces jointes, sont exclusivement 
transmis à l'usage de leur destinataire et leur contenu est strictement 
confidentiel. Une quelconque copie, retransmission, diffusion ou autre 
usage, ainsi que toute utilisation par des personnes physiques ou 
morales ou entités autres que le destinataire sont formellement 
interdits. Si vous recevez ce message par erreur, merci de le détruire 
et d'en avertir immédiatement l'expéditeur.
L'Internet ne permettant pas d'assurer l'intégrité de ce message, 
l'expéditeur décline toute responsabilité au cas où il aurait été 
intercepté ou modifié par quiconque.

On 2019-05-22 9:10 p.m., Travis Warlick via unit wrote:
> I am seeing the same log entries and have experienced a very similar 
> consumption problem about once a month.  At the onset of the problem, 
> there is a short spike in CPU usage, then the CPU usage goes to nearly 
> 0, but the load-average stays near 2x the number of VCPUs.  After 
> about 20~30 minutes, the server becomes completely unresponsive, and 
> I'm forced to reboot via the AWS console.  Some painfully obtained 
> observations indicated that all available network and file resources 
> were being consumed at this point.  After a couple occurrences, I 
> noticed in AWS CloudWatch metrics that the EFS volume (Elastic File 
> System, i.e. basically NFS) spiked to 100% throughput at nearly the 
> same time as the short spike in CPU usage and stayed there.  My 
> solution was to increase the volume's provisioned throughput to 10 
> Mbps, and I have not experienced the issue since, although I am still 
> seeing the log entries you mention.  My conclusion is that nginx unit 
> was consuming massive amounts of network and file system resources and 
> "choking" the server.  This is obviously not a long-term solution, 
> though, as there's no reason for such a massive amount of EFS volume 
> throughput for a Wordpress application.
>
> *
> Travis Warlick*
> *Manager, Development Operations*
>
> *maxmedia*
> www.maxmedia.com <https://www.maxmedia.com/>
>
> Find us on: LinkedIn <https://www.linkedin.com/company/maxmedia/> | 
> Twitter <https://twitter.com/maxmedia_atl> | Facebook 
> <https://www.facebook.com/MaxMediaATL> | Instagram 
> <https://www.instagram.com/maxmedia_atl/> | Vimeo <https://vimeo.com/mxm>
>
>
>
>
> On Wed, May 22, 2019 at 11:01 AM Peter TKATCHENKO <peter at bimp.fr 
> <mailto:peter at bimp.fr>> wrote:
>
>     Hello,
>
>     I'm using Unit to publish a PHP application (NextCloud). I am on
>     FreeBSD 11.2 (jailed), I use PHP 7.2 from packages. There is an
>     NGINX server as front-end.
>
>     When I check Unit logs I see many records like this:
>
>     2019/05/16 12:44:39 [info] 88085#101952 *73260 shutdown(177, 2)
>     failed (57: Socket is not connected)
>
>     And sometimes like this:
>
>     2019/05/16 12:53:39 [alert] 88085#101951 *74551 socket close(177)
>     failed (54: Connection reset by peer)
>
>     The application seems to work correctly, but I would like to
>     understand the cause of these errors, probably I need to tune
>     something?
>
>     Another problem is more important. Sometimes (once a week) 'unit:
>     router' process begins to consume 100% of 4 VCPU (!!), it takes
>     about 15 minutes to grow the CPU usage, and finish by blocking
>     completely the application ('bad gateway' error on nginx). Restart
>     of unitd service solves the problem. I see many errors of the
>     first type in logs at this moment, but nothing more interesting.
>
>     Best regards,
>
>     Peter TKATCHENKO
>     _______________________________________________
>     unit mailing list
>     unit at nginx.org <mailto:unit at nginx.org>
>     https://mailman.nginx.org/mailman/listinfo/unit
>
>
> _______________________________________________
> unit mailing list
> unit at nginx.org
> https://mailman.nginx.org/mailman/listinfo/unit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/unit/attachments/20190523/bce8149b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chamjcngnamokbfe.
Type: image/png
Size: 9948 bytes
Desc: not available
URL: <http://mailman.nginx.org/pipermail/unit/attachments/20190523/bce8149b/attachment.png>


More information about the unit mailing list