From peter at bimp.fr Wed May 22 15:00:53 2019 From: peter at bimp.fr (Peter TKATCHENKO) Date: Wed, 22 May 2019 17:00:53 +0200 Subject: Errors in Unit logs and CPU 100% Message-ID: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> Hello, I'm using Unit to publish a PHP application (NextCloud). I am on FreeBSD 11.2 (jailed), I use PHP 7.2 from packages. There is an NGINX server as front-end. When I check Unit logs I see many records like this: 2019/05/16 12:44:39 [info] 88085#101952 *73260 shutdown(177, 2) failed (57: Socket is not connected) And sometimes like this: 2019/05/16 12:53:39 [alert] 88085#101951 *74551 socket close(177) failed (54: Connection reset by peer) The application seems to work correctly, but I would like to understand the cause of these errors, probably I need to tune something? Another problem is more important. Sometimes (once a week) 'unit: router' process begins to consume 100% of 4 VCPU (!!), it takes about 15 minutes to grow the CPU usage, and finish by blocking completely the application ('bad gateway' error on nginx). Restart of unitd service solves the problem. I see many errors of the first type in logs at this moment, but nothing more interesting. Best regards, Peter TKATCHENKO -------------- next part -------------- An HTML attachment was scrubbed... URL: From twarlick at maxmedia.com Wed May 22 19:10:50 2019 From: twarlick at maxmedia.com (Travis Warlick) Date: Wed, 22 May 2019 15:10:50 -0400 Subject: Errors in Unit logs and CPU 100% In-Reply-To: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> References: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> Message-ID: I am seeing the same log entries and have experienced a very similar consumption problem about once a month. At the onset of the problem, there is a short spike in CPU usage, then the CPU usage goes to nearly 0, but the load-average stays near 2x the number of VCPUs. After about 20~30 minutes, the server becomes completely unresponsive, and I'm forced to reboot via the AWS console. Some painfully obtained observations indicated that all available network and file resources were being consumed at this point. After a couple occurrences, I noticed in AWS CloudWatch metrics that the EFS volume (Elastic File System, i.e. basically NFS) spiked to 100% throughput at nearly the same time as the short spike in CPU usage and stayed there. My solution was to increase the volume's provisioned throughput to 10 Mbps, and I have not experienced the issue since, although I am still seeing the log entries you mention. My conclusion is that nginx unit was consuming massive amounts of network and file system resources and "choking" the server. This is obviously not a long-term solution, though, as there's no reason for such a massive amount of EFS volume throughput for a Wordpress application. *Travis Warlick* *Manager, Development Operations* *maxmedia* www.maxmedia.com Find us on: LinkedIn | Twitter | Facebook | Instagram | Vimeo On Wed, May 22, 2019 at 11:01 AM Peter TKATCHENKO wrote: > Hello, > > I'm using Unit to publish a PHP application (NextCloud). I am on FreeBSD > 11.2 (jailed), I use PHP 7.2 from packages. There is an NGINX server as > front-end. > > When I check Unit logs I see many records like this: > > 2019/05/16 12:44:39 [info] 88085#101952 *73260 shutdown(177, 2) failed > (57: Socket is not connected) > > And sometimes like this: > > 2019/05/16 12:53:39 [alert] 88085#101951 *74551 socket close(177) failed > (54: Connection reset by peer) > > The application seems to work correctly, but I would like to understand > the cause of these errors, probably I need to tune something? > > Another problem is more important. Sometimes (once a week) 'unit: router' > process begins to consume 100% of 4 VCPU (!!), it takes about 15 minutes to > grow the CPU usage, and finish by blocking completely the application ('bad > gateway' error on nginx). Restart of unitd service solves the problem. I > see many errors of the first type in logs at this moment, but nothing more > interesting. > > Best regards, > Peter TKATCHENKO > _______________________________________________ > unit mailing list > unit at nginx.org > https://mailman.nginx.org/mailman/listinfo/unit -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at bimp.fr Wed May 22 20:03:00 2019 From: peter at bimp.fr (Peter TKATCHENKO) Date: Wed, 22 May 2019 22:03:00 +0200 Subject: Errors in Unit logs and CPU 100% In-Reply-To: References: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> Message-ID: <6bc1a8c0-fe69-c850-422d-9c4562119465@bimp.fr> Hi, Thanks for this information. I had my monitoring active during the last incident. It shows that the CPU was growing up during 20 minutes approximately (from 20% to 100%). In the same time I can see physical RAM usage growing up (from 30% to 60%), disk I/O were stable, far from critical levels, network traffic began to grow up when the CPU reached 100%. I'm in a private cloud, we are not limited neither in RAM I/O nor in HDD I/O. So I still don't understand how can I improve the situation. Best regards, Peter On 22/05/2019 21:10, Travis Warlick via unit wrote: > I am seeing the same log entries and have experienced a very similar > consumption problem about once a month.? At the onset of the problem, > there is a short spike in CPU usage, then the CPU usage goes to nearly > 0, but the load-average stays near 2x the number of VCPUs.? After > about 20~30 minutes, the server becomes completely unresponsive, and > I'm forced to reboot via the AWS console.? Some painfully obtained > observations indicated that all available network and file resources > were being consumed at this point.? After a couple occurrences, I > noticed in AWS CloudWatch metrics that the EFS volume (Elastic File > System, i.e. basically NFS) spiked to 100% throughput at nearly the > same time as the short spike in CPU usage and stayed there.? My > solution was to increase the volume's provisioned throughput to 10 > Mbps, and I have not experienced the issue since, although I am still > seeing the log entries you mention.? My conclusion is that nginx unit > was consuming massive amounts of network and file system resources and > "choking" the server.? This is obviously not a long-term solution, > though, as there's no reason for such a massive amount of EFS volume > throughput for a Wordpress application. > > * > Travis Warlick* > *Manager, Development Operations* > > *maxmedia* > www.maxmedia.com > > Find us on: LinkedIn ?| > Twitter ?| Facebook > ?| Instagram > ?| Vimeo > > > > > On Wed, May 22, 2019 at 11:01 AM Peter TKATCHENKO > wrote: > > Hello, > > I'm using Unit to publish a PHP application (NextCloud). I am on > FreeBSD 11.2 (jailed), I use PHP 7.2 from packages. There is an > NGINX server as front-end. > > When I check Unit logs I see many records like this: > > 2019/05/16 12:44:39 [info] 88085#101952 *73260 shutdown(177, 2) > failed (57: Socket is not connected) > > And sometimes like this: > > 2019/05/16 12:53:39 [alert] 88085#101951 *74551 socket close(177) > failed (54: Connection reset by peer) > > The application seems to work correctly, but I would like to > understand the cause of these errors, probably I need to tune > something? > > Another problem is more important. Sometimes (once a week) 'unit: > router' process begins to consume 100% of 4 VCPU (!!), it takes > about 15 minutes to grow the CPU usage, and finish by blocking > completely the application ('bad gateway' error on nginx). Restart > of unitd service solves the problem. I see many errors of the > first type in logs at this moment, but nothing more interesting. > > Best regards, > > Peter TKATCHENKO > _______________________________________________ > unit mailing list > unit at nginx.org > https://mailman.nginx.org/mailman/listinfo/unit > > > _______________________________________________ > unit mailing list > unit at nginx.org > https://mailman.nginx.org/mailman/listinfo/unit -- *Peter TKATCHENKO | Consultant technique* 04 72 60 39 00 | 0 812 211 211 150 all?e des Fr?nes - 69760 LIMONEST BIMP Groupe LOGO *Retrouvez-nous sur _www.bimp.fr _* *|* _*www.bimp-pro.fr *_ *|* _*www.bimp-education.fr *_ Ce message et ?ventuellement les pi?ces jointes, sont exclusivement transmis ? l'usage de leur destinataire et leur contenu est strictement confidentiel. Une quelconque copie, retransmission, diffusion ou autre usage, ainsi que toute utilisation par des personnes physiques ou morales ou entit?s autres que le destinataire sont formellement interdits. Si vous recevez ce message par erreur, merci de le d?truire et d'en avertir imm?diatement l'exp?diteur. L'Internet ne permettant pas d'assurer l'int?grit? de ce message, l'exp?diteur d?cline toute responsabilit? au cas o? il aurait ?t? intercept? ou modifi? par quiconque. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ogdmnndjjpbchhkf. Type: image/png Size: 9948 bytes Desc: not available URL: From peter at bimp.fr Thu May 23 10:23:20 2019 From: peter at bimp.fr (Peter TKATCHENKO) Date: Thu, 23 May 2019 12:23:20 +0200 Subject: Errors in Unit logs and CPU 100% In-Reply-To: References: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> Message-ID: <23c47ef3-b80e-3950-ce93-8ed47a3534bd@bimp.fr> Hello, I had the problem of CPU again this morning, it seems that the situation degrades :( I could attach the debugger to get the backtrace: #0? 0x0000000800dfe81a in _kevent () from /lib/libc.so.7 #1? 0x0000000800a9bcc2 in ?? () from /lib/libthr.so.3 #2? 0x00000000004238c1 in nxt_kqueue_poll (engine=0x801416500, timeout=) at src/nxt_kqueue_engine.c:692 #3? 0x0000000000412bb0 in nxt_event_engine_start (engine=0x801416500) at src/nxt_event_engine.c:549 #4? 0x0000000000406c47 in main (argc=, argv=) at src/nxt_main.c:35 If I kill the process - a new one is created and takes again 100% of CPU. And if I restart the service - it's OK. Please, help! *Peter TKATCHENKO | Consultant technique* 04 72 60 39 00 | 0 812 211 211 150 all?e des Fr?nes - 69760 LIMONEST BIMP Groupe LOGO *Retrouvez-nous sur _www.bimp.fr _* *|* _*www.bimp-pro.fr *_ *|* _*www.bimp-education.fr *_ Ce message et ?ventuellement les pi?ces jointes, sont exclusivement transmis ? l'usage de leur destinataire et leur contenu est strictement confidentiel. Une quelconque copie, retransmission, diffusion ou autre usage, ainsi que toute utilisation par des personnes physiques ou morales ou entit?s autres que le destinataire sont formellement interdits. Si vous recevez ce message par erreur, merci de le d?truire et d'en avertir imm?diatement l'exp?diteur. L'Internet ne permettant pas d'assurer l'int?grit? de ce message, l'exp?diteur d?cline toute responsabilit? au cas o? il aurait ?t? intercept? ou modifi? par quiconque. On 2019-05-22 9:10 p.m., Travis Warlick via unit wrote: > I am seeing the same log entries and have experienced a very similar > consumption problem about once a month.? At the onset of the problem, > there is a short spike in CPU usage, then the CPU usage goes to nearly > 0, but the load-average stays near 2x the number of VCPUs.? After > about 20~30 minutes, the server becomes completely unresponsive, and > I'm forced to reboot via the AWS console.? Some painfully obtained > observations indicated that all available network and file resources > were being consumed at this point.? After a couple occurrences, I > noticed in AWS CloudWatch metrics that the EFS volume (Elastic File > System, i.e. basically NFS) spiked to 100% throughput at nearly the > same time as the short spike in CPU usage and stayed there.? My > solution was to increase the volume's provisioned throughput to 10 > Mbps, and I have not experienced the issue since, although I am still > seeing the log entries you mention.? My conclusion is that nginx unit > was consuming massive amounts of network and file system resources and > "choking" the server.? This is obviously not a long-term solution, > though, as there's no reason for such a massive amount of EFS volume > throughput for a Wordpress application. > > * > Travis Warlick* > *Manager, Development Operations* > > *maxmedia* > www.maxmedia.com > > Find us on: LinkedIn ?| > Twitter ?| Facebook > ?| Instagram > ?| Vimeo > > > > > On Wed, May 22, 2019 at 11:01 AM Peter TKATCHENKO > wrote: > > Hello, > > I'm using Unit to publish a PHP application (NextCloud). I am on > FreeBSD 11.2 (jailed), I use PHP 7.2 from packages. There is an > NGINX server as front-end. > > When I check Unit logs I see many records like this: > > 2019/05/16 12:44:39 [info] 88085#101952 *73260 shutdown(177, 2) > failed (57: Socket is not connected) > > And sometimes like this: > > 2019/05/16 12:53:39 [alert] 88085#101951 *74551 socket close(177) > failed (54: Connection reset by peer) > > The application seems to work correctly, but I would like to > understand the cause of these errors, probably I need to tune > something? > > Another problem is more important. Sometimes (once a week) 'unit: > router' process begins to consume 100% of 4 VCPU (!!), it takes > about 15 minutes to grow the CPU usage, and finish by blocking > completely the application ('bad gateway' error on nginx). Restart > of unitd service solves the problem. I see many errors of the > first type in logs at this moment, but nothing more interesting. > > Best regards, > > Peter TKATCHENKO > _______________________________________________ > unit mailing list > unit at nginx.org > https://mailman.nginx.org/mailman/listinfo/unit > > > _______________________________________________ > unit mailing list > unit at nginx.org > https://mailman.nginx.org/mailman/listinfo/unit -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: chamjcngnamokbfe. Type: image/png Size: 9948 bytes Desc: not available URL: From vbart at nginx.com Thu May 23 11:54:05 2019 From: vbart at nginx.com (Valentin V. Bartenev) Date: Thu, 23 May 2019 14:54:05 +0300 Subject: Errors in Unit logs and CPU 100% In-Reply-To: <23c47ef3-b80e-3950-ce93-8ed47a3534bd@bimp.fr> References: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> <23c47ef3-b80e-3950-ce93-8ed47a3534bd@bimp.fr> Message-ID: <5530404.IHdkP49TLx@vbart-workstation> On Thursday 23 May 2019 12:23:20 Peter TKATCHENKO wrote: > Hello, > > I had the problem of CPU again this morning, it seems that the situation > degrades :( > > I could attach the debugger to get the backtrace: > > #0 0x0000000800dfe81a in _kevent () from /lib/libc.so.7 > #1 0x0000000800a9bcc2 in ?? () from /lib/libthr.so.3 > #2 0x00000000004238c1 in nxt_kqueue_poll (engine=0x801416500, > timeout=) at src/nxt_kqueue_engine.c:692 > #3 0x0000000000412bb0 in nxt_event_engine_start (engine=0x801416500) at > src/nxt_event_engine.c:549 > #4 0x0000000000406c47 in main (argc=, argv= out>) at src/nxt_main.c:35 > > If I kill the process - a new one is created and takes again 100% of > CPU. And if I restart the service - it's OK. > > Please, help! > [..] Hi, Unfortunately, this particular backtrace is useless, because it's from different thread (not the one eats CPU). Could you reproduce the issue with debug log enabled? See for details: http://unit.nginx.org/troubleshooting/#debug-log wbr, Valentin V. Bartenev From vbart at nginx.com Fri May 24 15:53:18 2019 From: vbart at nginx.com (Valentin V. Bartenev) Date: Fri, 24 May 2019 18:53:18 +0300 Subject: Errors in Unit logs and CPU 100% In-Reply-To: <11046061-d96e-f440-7d04-4762ce010ec2@bimp.fr> References: <5c11341b-ed01-6055-8739-a23a9c4d053e@bimp.fr> <5530404.IHdkP49TLx@vbart-workstation> <11046061-d96e-f440-7d04-4762ce010ec2@bimp.fr> Message-ID: <1859024.fEBGazasv8@vbart-workstation> On Thursday 23 May 2019 14:52:37 Peter TKATCHENKO wrote: > Thanks for your answer. > > I've just rebuilt the ports with 'DEBUG' option and restarted the service. > > When the problem comes - I'll send the logs here. Hope the problem comes > soon as the log is growing very fast... > [..] In order to avoid growing it too much, you can periodically truncate the log file. # truncate -s 0 /path/to/unit.log wbr, Valentin V. Bartenev From vbart at nginx.com Thu May 30 16:28:34 2019 From: vbart at nginx.com (Valentin V. Bartenev) Date: Thu, 30 May 2019 19:28:34 +0300 Subject: Unit 1.9.0 release Message-ID: <2299764.1mxMrBuCdO@vbart-workstation> Hi, I'm glad to announce a new release of NGINX Unit. In this release, we continue improving routing capabilities for more advanced and precise request matching. Besides that, the control API was extended with POST operations to simplify array manipulation in configuration. Please check the documentation about new features: - Matching rules: https://unit.nginx.org/configuration/#condition-matching - API operations: https://unit.nginx.org/configuration/#configuration-management If you prefer to perceive information visually, here's a recording of NGINX Meetup that gives a good overview of dynamic application routing, although doesn't discuss new features from this release: - https://www.youtube.com/watch?v=5O4TjbbxTxw Also, a number of annoying bugs were fixed; thanks to your feedback, the Node.js module now works fine with more applications. Changes with Unit 1.9.0 30 May 2019 *) Feature: request routing by arguments, headers, and cookies. *) Feature: route matching patterns allow a wildcard in the middle. *) Feature: POST operation for appending elements to arrays in configuration. *) Feature: support for changing credentials using CAP_SETUID and CAP_SETGID capabilities on Linux without running main process as privileged user. *) Bugfix: memory leak in the router process might have happened when a client prematurely closed the connection. *) Bugfix: applying a large configuration might have failed. *) Bugfix: PUT and DELETE operations on array elements in configuration did not work. *) Bugfix: request schema in applications did not reflect TLS connections. *) Bugfix: restored compatibility with Node.js applications that use ServerResponse._implicitHeader() function; the bug had appeared in 1.7. *) Bugfix: various compatibility issues with Node.js applications. With this release, packages for Ubuntu 19.04 "disco" are also available. See the website for a full list of available repositories: - https://unit.nginx.org/installation/ Meanwhile, we continue working on WebSocket support. It's almost ready and has great chances to be included in the next release for Node.js and Java modules. Work on proxying and static files serving is also in progress; this will take a bit more time. wbr, Valentin V. Bartenev