Debugging Nginx Memory Spikes on Production Servers

Maxim Dounin mdounin at mdounin.ru
Wed Sep 20 19:07:16 UTC 2023


Hello!

On Wed, Sep 20, 2023 at 11:55:39AM -0500, Lance Dockins wrote:

> Are there any best practices or processes for debugging sudden memory
> spikes in Nginx on production servers?  We have a few very high-traffic
> servers that are encountering events where the Nginx process memory
> suddenly spikes from around 300mb to 12gb of memory before being shut down
> by an out-of-memory termination script.  We don't have Nginx compiled with
> debug mode and even if we did, I'm not sure that we could enable that
> without overly taxing the server due to the constant high traffic load that
> the server is under.  Since it's a server with public websites on it, I
> don't know that we could filter the debug log to a single IP either.
> 
> Access, error, and info logs all seem to be pretty normal.  Internal
> monitoring of the Nginx process doesn't suggest that there are major
> connection spikes either.  Theoretically, it is possible that there is just
> a very large sudden burst of traffic coming in that is hitting our rate
> limits very hard and bumping the memory that Nginx is using until the OOM
> termination process closes Nginx (which would prevent Nginx from logging
> the traffic).  We just don't have a good way to see where the memory in
> Nginx is being allocated when these sorts of spikes occur and are looking
> for any good insight into how to go about debugging that sort of thing on a
> production server.
> 
> Any insights into how to go about troubleshooting it?

In no particular order:

- Make sure you are monitoring connection and request numbers as 
  reported by the stub_status module as well as memory usage.

- Check 3rd party modules you are using, if there are any - try 
  disabling them.

- If you are using subrequests, such as with SSI, make sure these 
  won't generate enormous number of subrequests.

- Check your configuration for buffer sizes and connection limits, 
  and make sure that your server can handle maximum memory 
  allocation without invoking the OOM Killer, that is: 
  worker_processes * worker_connections * (total amount of various 
  buffers as allocated per connection).  If not, consider reducing 
  various parts of the equation.

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list