Debugging Nginx Memory Spikes on Production Servers
Maxim Dounin
mdounin at mdounin.ru
Wed Sep 20 19:07:16 UTC 2023
Hello!
On Wed, Sep 20, 2023 at 11:55:39AM -0500, Lance Dockins wrote:
> Are there any best practices or processes for debugging sudden memory
> spikes in Nginx on production servers? We have a few very high-traffic
> servers that are encountering events where the Nginx process memory
> suddenly spikes from around 300mb to 12gb of memory before being shut down
> by an out-of-memory termination script. We don't have Nginx compiled with
> debug mode and even if we did, I'm not sure that we could enable that
> without overly taxing the server due to the constant high traffic load that
> the server is under. Since it's a server with public websites on it, I
> don't know that we could filter the debug log to a single IP either.
>
> Access, error, and info logs all seem to be pretty normal. Internal
> monitoring of the Nginx process doesn't suggest that there are major
> connection spikes either. Theoretically, it is possible that there is just
> a very large sudden burst of traffic coming in that is hitting our rate
> limits very hard and bumping the memory that Nginx is using until the OOM
> termination process closes Nginx (which would prevent Nginx from logging
> the traffic). We just don't have a good way to see where the memory in
> Nginx is being allocated when these sorts of spikes occur and are looking
> for any good insight into how to go about debugging that sort of thing on a
> production server.
>
> Any insights into how to go about troubleshooting it?
In no particular order:
- Make sure you are monitoring connection and request numbers as
reported by the stub_status module as well as memory usage.
- Check 3rd party modules you are using, if there are any - try
disabling them.
- If you are using subrequests, such as with SSI, make sure these
won't generate enormous number of subrequests.
- Check your configuration for buffer sizes and connection limits,
and make sure that your server can handle maximum memory
allocation without invoking the OOM Killer, that is:
worker_processes * worker_connections * (total amount of various
buffers as allocated per connection). If not, consider reducing
various parts of the equation.
Hope this helps.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list