Debugging Nginx Memory Spikes on Production Servers

Tue Sep 26 15:30:21 UTC 2023

Dmitry,

I’ve been testing this more and I think that there’s more going on here than I was originally thinking. I have some js_set code that I have to run to properly route and filter requests and I noticed that it was consuming around 1mb per request (which woudl then carry to later header filters and things that NJS had to do). Given that it’s just a bunch of if/then statements, that seemed very odd to me. In very concurent and high traffic environments, that’s a little much for extra memory use - particularly in environments where you’ll have a mix of requests to slower upstreams or bigger file downloads. Since it’s hard to get a totally accurate read on memory uses I did both a test of how much memory a single if statement used and a separate one over 10 if statements (and averaged out the reported memory increase)

From what I can see, it looks like basic statements like these:

if(r.variables.some_variable.match(/someregex or string/i){
// block content
}

Consumes about 17k to 20k per “if” or “match" statement minimum (even if there is nothing in the block content). I’m fairly certain that it’s mostly the string.match statements that are causing this. Since that memory isn’t freed, it just accumulates.

If you have a lot of if/then statements (particularly with string matches), it’s pretty easy to burn through 1mb of memory that then carries through the entire request even for very small files unless you do a forced internalRedirect to regenerate the VM.

Obviously variable declarations and assignments are going to consume more memory. That’s to be expected. But for if statements and string matches to use (but not free) the memory, seems like a little bit of a problem. If you have a complex routing table with a lot of if/thens, it’s going to create some problems (particularly if you have to do more than one js_set or add in something like js_content or header filters as well). To be clear, I’m not substituing the use of location blocks for NJS scripting. The only way to try to replicate the logic that I need in Nginx is through complex use of the Nginx if statement too (and those don’t fully replicate it)

Up until now, I had assumed that string.match types of statements were just transparently calling PCRE behind the scenes so that the associated memory from the PCRE call was being freed after use. Maybe that’s not even an accurate read on how Nginx is using PCRE but that’s how I envisioned it.

In any case, that lead me to a few questions.
How does keepalive affect these VMs? I realize that keepalive is for the connection rather than the request, but I still wanted to confirm that VM’s are not sticking around occupying memory after the request completes due to keepavlive settings. We have a mix of fast requests and long running requests (e.g 5-10s) so if enough long running requests build up at the same time that a flood of very fast requests come in, NJS could certainly burn out a lot of memory under the current conditions as I understand it. If there were a lot of large file downloads, those would also occupy memory for the entirety of the download if I understand correctly.
Is it possible to adjust the NJS codebase to free memory for the actual if condition and for string matches after they’ve run as long as they didn’t include a variable assignment inside of the actual if condition? I’m not talking about whatever is in the block content - just the actual condition. Given how likely it is for people to use basic if conditions and string matches in NJS, that seems like it might be a stopgap measure that reduces the memory footprint without having to build full garbage collection.
Is there any workaround for this type of memory problem other than just to force the use of an internalRedirect to drop the existing VM and create a new one?
Could it make sense to allow for a directive that would force the destruction and regeneration of a new JS VM? That wouldn’t solve the memory leak that bulids to 1mb per request but it would shorten its lifetime (which could ease memory pressure in situations where there are some long running requests holding open the requests)

Thanks again for all of your feedback.

—
Lance Dockins

> On Friday, Sep 22, 2023 at 8:34 AM, Me <lance at wordkeeper.com (mailto:lance at wordkeeper.com)> wrote:
> I am checking the content type, yes. But in my case, I’m just switching between body parsing methodologies depending on the body type. I do actually have a few scenarios where I have to evaluate body data in multipart submissions, but I only actually need the parts of the multipart form that are not including attachments (basic key/value string data). I can’t say for sure that this is the common use case for POST body parsing but when I’ve read the Lua GitHub issues and various examples and discussions in the past, it has always seemed to me like the only common use cases were for url encoded POST bodies OR for the plain key/value string data (not attachments) in multipart bodies. I can’t say that I’ve seen many discussions that were asking for access to the file attachment data in POST bodies - just for what it is worth.
>
> I did notice that the boundary had an effect on memory. It seems like memory is sort of contained as long as we’re talking about an actual multipart boundary. When it’s a single char or something otherwise smaller, the memory use is extreme but that’s partly inherent to even spitting data that way.
>
> For now, I think that I’m just going to have to use a workaround that limits when POST body parsing triggers. There’s just no way to do it at all under certain conditions right now.
>
> Thank you for all of your feedback and work on NJS and for filing the POST body provision in NJS into a feature request.
>
> —
> Lance Dockins
>
> > On Thursday, Sep 21, 2023 at 7:47 PM, Dmitry Volyntsev <xeioex at nginx.com (mailto:xeioex at nginx.com)> wrote:
> >
> > On 9/21/23 4:41 PM, Lance Dockins wrote:
> > > That’s good info. Thank you.
> > >
> > > I have been doing some additional testing since my email last night
> > > and I have seen enough evidence to believe that file I/O in NJS is
> > > basically the source of the memory issues. I did some testing with
> > > very basic commands like readFileSync and Buffer + readSync and in all
> > > cases, the memory footprint when doing file handling in NJS is massive.
> > >
> > > Just doing this:
> > >
> > > let content = fs.readFileSync(path/to//file);
> > > let parts = content.split(boundary);
> > >
> > > Resulted in memory use that was close to a minimum of 4-8x the size of
> > > the file during my testing. We do have an upper bound on files that
> > > can be uploaded and that does contain this somwhat but it’s not hard
> > > for a larger request that is 99% file attachment to use exhorbitant
> > > amounts of memory.
> > >
> > >
> > >
> > Regarding the task at hand, do you check for Content-Type of the POST
> > body? So you can exclude anything except probably
> > application/x-www-form-urlencoded. At least what I see in lua: the
> > handler is only looking for application/x-www-form-urlencoded and not
> > multipart/form-data.
> >
> > https://github.com/openresty/lua-nginx-module/blob/c89469e920713d17d703a5f3736c9335edac22bf/src/ngx_http_lua_args.c#L171
> >
> >
> > > I actually tried doing a Buffer + readSync variation on the same
> > > thing and the memory footprint was actually FAR FAR worse when I did that.
> > >
> > >
> > >
> > As of now, the resulting memory consumption will depend heavily on the
> > boundary.
> >
> > In worst case, for 1mb of memory file that is split into 1 character
> > array, You will get ~16x memory consumed, because every 1 byte character
> > will be put into a njs_value_t structure.
> >
> > With larger chunks the situation will be less extreme. Right now we are
> > implementing a way to deduplicate identical strings, this may help in
> > some situations.
> >
> > > The 4-8x minimum memory commit seems like a problem to me just
> > > generally. But the fact that readSync doesn’t seem to be any better
> > > on memory (much worse actually) basically means that NJS is only safe
> > > to use for processing smaller files (or POST bodies) right now.
> > > There’s just no good way to keep data that you don’t care about in a
> > > file from occupying excessive amounts of memory that can’t be
> > > reclaimed. If there is no way to improve the memory footprint when
> > > handling files (or big strings), no memory conservative way to stream
> > > a file through some sort of buffer, and no first-party utility for
> > > providing parsed POST bodies right now,
> > > then it might be worth the time to put some notes in the NJS docs that
> > > the fs module may not be appropriate for larger files (e.g. files over
> > > 1mb).
> > >
> > > For what it’s worth, I’d also love to see some examples of how to
> > > properly use fs.readSync in the NJS examples docs. There really
> > > wasn’t much out there for that for NJS (or even in a lot of the Node
> > > docs) so I can’t say that my specific test implementation for that was
> > > ideal. But that’s just above and beyond the basic problems that I’m
> > > seeing with memory use with any form of file I/O at all (since the
> > > memory problems seem to be persistent whether doing reads or even log
> > > writes).
> > >
> > > —
> > > Lance Dockins
> > >
> > >
> > > On Thursday, Sep 21, 2023 at 5:01 PM, Dmitry Volyntsev
> > > <xeioex at nginx.com> wrote:
> > >
> > > On 9/21/23 6:50 AM, Lance Dockins wrote:
> > >
> > > Hi Lance,
> > >
> > > See my comments below.
> > >
> > > > Thanky you, Dmitry.
> > > >
> > > > One question before I describe what we are doing with NJS. I did
> > > > read
> > > > about the VM handling process before switching from Lua to NJS
> > > > and it
> > > > sounded very practical but my current understanding is that there
> > > > could be multiple VM’s instantiated for a single request. A js_set,
> > > > js_content, and js_header_filter directive that applies to a single
> > > > request, for example, would instantiate 3 VMs. And were you to need
> > > > to set multiple variables with js_set, then keep adding to that #
> > > > of VMs.
> > > >
> > > >
> > > This is not correct. For js_set, js_content and js_header_filter
> > > there
> > > is only a single VM.
> > > The internalRedirect() is the exception, because a VM does not
> > > survive
> > > it, but the previous VMs will not be freed until current request is
> > > finished. BTW, a VM instance itself is pretty small in size (~2kb)
> > > so it
> > > should not be a problem if you have a reasonable number of redirects.
> > >
> > >
> > > >
> > > > My original understanding of that was that those VMs would be
> > > > destroyed once they exited so even if you had multiple VMs
> > > > instantiated per request, the memory impact would not be
> > > > cumulative in
> > > > a single request. Is that understanding correct? Or are you saying
> > > > that each VM accumulates more and more memory until the entire
> > > > request
> > > > completes?
> > > >
> > > > As far as how we’re using NJS, we’re mostly using it for header
> > > > filters, internal redirection, and access control. So there really
> > > > shouldn’t be a threat to memory in most instances unless we’re not
> > > > just dealing with a single request memory leak inside of a VM but
> > > > also
> > > > a memory leak that involves every VM that NJS instantiates just
> > > > accumulating memory until the request completes.
> > > >
> > > > Right now, my working theory about what is most likely to be
> > > > creating
> > > > the memory spikes has to do with POST body analysis. Unfortunately,
> > > > some of the requests that I have to deal with are POSTs that have to
> > > > either be denied access or routed differently depending on the
> > > > contents of the POST body. Unfortunately, these same routes can
> > > > vary
> > > > in the size of the POST body and I have no control over how any of
> > > > that works because the way it works is controlled by third parties.
> > > > One of those third parties has significant market share on the
> > > > internet so we can’t really avoid dealing with it.
> > > >
> > > > In any case, before we switched to NJS, we were using Lua to do the
> > > > same things and that gave us the advantage of doing both memory
> > > > cleanup if needed and also doing easy analysis of POST body args. I
> > > > was able to do this sort of thing with Lua before:
> > > > local post_args, post_err = ngx.req.get_post_args()
> > > > if post_args.arg_name = something then
> > > >
> > > > But in NJS, there’s no such POST body utility so I had to write my
> > > > own. The code that I use to parse out the POST body works for both
> > > > URL encoded POST bodies and multipart POST bodies, but it has to
> > > > read
> > > > the entire POST into a variable before I can use it. For small
> > > > POSTs,
> > > > that’s not a problem. For larger POSTs that contain a big
> > > > attachment,
> > > > it would be. Ultimately, I only care about the string key/value
> > > > pairs
> > > > for my purposes (not file attachments) so I was hoping to discard
> > > > attachment data while parsing the body.
> > > >
> > > >
> > > >
> > > Thank you for the feedback, I will add it as to a future feature
> > > list.
> > >
> > > > I think that that is actually how Lua’s version of this works too.
> > > > So my next thought was that I could use a Buffer and rs.readSync to
> > > > read the POST body in buffer frames to keep memory minimal so that I
> > > > could could discard the any file attachments from the POST body and
> > > > just evaluate the key/value data that uses simple strings. But from
> > > > what you’re saying, it sounds like there’s basically no difference
> > > > between fs.readSync w/ a Buffer and rs.readFileSync in terms of
> > > > actual
> > > > memory use. So either way, with a large POST body, you’d be
> > > > steamrolling the memory use in a single Nginx worker thread. When I
> > > > had to deal with stuff like this in Lua, I’d just run
> > > > collectgarbage()
> > > > to clean up memory and it seemed to work fine. But then I also
> > > > wasn’t
> > > > having to parse out the POST body myself in Lua either.
> > > >
> > > > It’s possible that something else is going on other than that.
> > > > qs.parse seems like it could get us into some trouble if the
> > > > query_string that was passed was unusuall long too from what you’re
> > > > saying about how memory is handled.
> > > >
> > > >
> > > for qs.parse() there is a limit for a number of arguments, which
> > > you can
> > > specify.
> > >
> > > >
> > > > None of the situations that I’m handling are for long running
> > > > requests. They’re all designed for very fast requests that come
> > > > into
> > > > the servers that I manage on a constant basis.
> > > >
> > > > If you can shed some light on the way that VM’s and their memory are
> > > > handled per my question above and any insights into what to do about
> > > > this type of situation, that would help a lot. I don’t know if
> > > > there
> > > > are any plans to offer a POST body parsing feature in NJS for those
> > > > that need to evalute POST body data like how Lua did it, but if
> > > > there
> > > > was some way to be able to do that at the Nginx layer instead of at
> > > > the NJS layer, it seems like that could be a lot more sensitive to
> > > > memory use. Right now, if my understanding is correct, the only
> > > > option that I’d even have would be to just stop doing POST body
> > > > handling if the POST body is above a certain total size. I guess if
> > > > there was some way to forcibly free memory, that would help too.
> > > > But
> > > > I don’t think that that is as common of a problem as having to deal
> > > > with very large query strings that some third party appends to a URL
> > > > (probably maliciously) and/or a very large file upload attached to a
> > > > multipart POST. So the only concern that I’d have about memory in a
> > > > situation where I don’t have to worry about memory when parsing a
> > > > larger file woudl be if multiple js_sets and such would just keep
> > > > spawning VMs and accumulating memory during a single request.
> > > >
> > > > Any thoughts?
> > > >
> > > > —
> > > > Lance Dockins
> > > >
> > > >
> > > > On Thursday, Sep 21, 2023 at 1:45 AM, Dmitry Volyntsev
> > > > <xeioex at nginx.com> wrote:
> > > >
> > > > On 20.09.2023 20:37, Lance Dockins wrote:
> > > > > So I guess my question at the moment is whether endless memory use
> > > > > growth being reported by njs.memoryStats.size after file writes is
> > > > > some sort of false positive tied to quirks in how memory use is
> > > > > being
> > > > > reported or whether this is indicative of a memory leak? Any
> > > > > insight
> > > > > would be appreicated.
> > > >
> > > > Hi Lance,
> > > > The reason njs.memoryStats.size keeps growing is because NJS uses
> > > > arena
> > > > memory allocator linked to a current request and a new object
> > > > representing memoryStats structure is returned every time
> > > > njs.memoryStats is accessed. Currently NJS does not free most of the
> > > > internal objects and structures until the current request is
> > > > destroyed
> > > > because it is not intended for a long running code.
> > > >
> > > > Regarding the sudden memory spikes, please share some details
> > > > about JS
> > > > code you are using.
> > > > One place to look is to analyze the amount of traffic that goes to
> > > > NJS
> > > > locations and what exactly those location do.
> > > >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230926/c445d2c6/attachment-0001.htm>