Debugging Nginx Memory Spikes on Production Servers

Fri Sep 22 00:47:06 UTC 2023

On 9/21/23 4:41 PM, Lance Dockins wrote:
> That’s good info.  Thank you.
>
> I have been doing some additional testing since my email last night 
> and I have seen enough evidence to believe that file I/O in NJS is 
> basically the source of the memory issues.  I did some testing with 
> very basic commands like readFileSync and Buffer + readSync and in all 
> cases, the memory footprint when doing file handling in NJS is massive.
>
> Just doing this:
>
> let content = fs.readFileSync(path/to//file);
> let parts = content.split(boundary);
>
> Resulted in memory use that was close to a minimum of 4-8x the size of 
> the file during my testing.  We do have an upper bound on files that 
> can be uploaded and that does contain this somwhat but it’s not hard 
> for a larger request that is 99% file attachment to use exhorbitant 
> amounts of memory.
>
>
>
Regarding the task at hand, do you check for Content-Type of the POST 
body? So you can exclude anything except probably 
application/x-www-form-urlencoded. At least what I see in lua: the 
handler is only looking for application/x-www-form-urlencoded and not 
multipart/form-data.

https://github.com/openresty/lua-nginx-module/blob/c89469e920713d17d703a5f3736c9335edac22bf/src/ngx_http_lua_args.c#L171

>  I actually tried doing a Buffer + readSync variation on the same 
> thing and the memory footprint was actually FAR FAR worse when I did that.
>
>
>
As of now, the resulting memory consumption will depend heavily on the 
boundary.

In worst case, for 1mb of memory file that is split into 1 character 
array, You will get ~16x memory consumed, because every 1 byte character 
will be put into a njs_value_t structure.

With larger chunks the situation will be less extreme. Right now we are 
implementing a way to deduplicate identical strings, this may help in 
some situations.

> The 4-8x minimum memory commit seems like a problem to me just 
> generally.  But the fact that readSync doesn’t seem to be any better 
> on memory (much worse actually) basically means that NJS is only safe 
> to use for processing smaller files (or POST bodies) right now. 
>  There’s just no good way to keep data that you don’t care about in a 
> file from occupying excessive amounts of memory that can’t be 
> reclaimed. If there is no way to improve the memory footprint when 
> handling files (or big strings), no memory conservative way to stream 
> a file through some sort of buffer, and no first-party utility for 
> providing parsed POST bodies right now,
> then it might be worth the time to put some notes in the NJS docs that 
> the fs module may not be appropriate for larger files (e.g. files over 
> 1mb).
>
> For what it’s worth, I’d also love to see some examples of how to 
> properly use fs.readSync in the NJS examples docs.  There really 
> wasn’t much out there for that for NJS (or even in a lot of the Node 
> docs) so I can’t say that my specific test implementation for that was 
> ideal.  But that’s just above and beyond the basic problems that I’m 
> seeing with memory use with any form of file I/O at all (since the 
> memory problems seem to be persistent whether doing reads or even log 
> writes).
>
> —
> Lance Dockins
>
>
>     On Thursday, Sep 21, 2023 at 5:01 PM, Dmitry Volyntsev
>     <xeioex at nginx.com> wrote:
>
>     On 9/21/23 6:50 AM, Lance Dockins wrote:
>
>     Hi Lance,
>
>     See my comments below.
>
>>     Thanky you, Dmitry.
>>
>>     One question before I describe what we are doing with NJS.  I did
>>     read
>>     about the VM handling process before switching from Lua to NJS
>>     and it
>>     sounded very practical but my current understanding is that there
>>     could be multiple VM’s instantiated for a single request.  A js_set,
>>     js_content, and js_header_filter directive that applies to a single
>>     request, for example, would instantiate 3 VMs.  And were you to need
>>     to set multiple variables with js_set, then keep adding to that #
>>     of VMs.
>>
>>
>     This is not correct. For js_set, js_content and js_header_filter
>     there
>     is only a single VM.
>     The internalRedirect() is the exception, because a VM does not
>     survive
>     it, but the previous VMs will not be freed until current request is
>     finished. BTW, a VM instance itself is pretty small in size (~2kb)
>     so it
>     should not be a problem if you have a reasonable number of redirects.
>
>
>>
>>     My original understanding of that was that those VMs would be
>>     destroyed once they exited so even if you had multiple VMs
>>     instantiated per request, the memory impact would not be
>>     cumulative in
>>     a single request.  Is that understanding correct?  Or are you saying
>>     that each VM accumulates more and more memory until the entire
>>     request
>>     completes?
>>
>>     As far as how we’re using NJS, we’re mostly using it for header
>>     filters, internal redirection, and access control.  So there really
>>     shouldn’t be a threat to memory in most instances unless we’re not
>>     just dealing with a single request memory leak inside of a VM but
>>     also
>>     a memory leak that involves every VM that NJS instantiates just
>>     accumulating memory until the request completes.
>>
>>     Right now, my working theory about what is most likely to be
>>     creating
>>     the memory spikes has to do with POST body analysis.  Unfortunately,
>>     some of the requests that I have to deal with are POSTs that have to
>>     either be denied access or routed differently depending on the
>>     contents of the POST body.  Unfortunately, these same routes can
>>     vary
>>     in the size of the POST body and I have no control over how any of
>>     that works because the way it works is controlled by third parties.
>>      One of those third parties has significant market share on the
>>     internet so we can’t really avoid dealing with it.
>>
>>     In any case, before we switched to NJS, we were using Lua to do the
>>     same things and that gave us the advantage of doing both memory
>>     cleanup if needed and also doing easy analysis of POST body args.  I
>>     was able to do this sort of thing with Lua before:
>>     local post_args, post_err = ngx.req.get_post_args()
>>     if post_args.arg_name = something then
>>
>>     But in NJS, there’s no such POST body utility so I had to write my
>>     own.  The code that I use to parse out the POST body works for both
>>     URL encoded POST bodies and multipart POST bodies, but it has to
>>     read
>>     the entire POST into a variable before I can use it.  For small
>>     POSTs,
>>     that’s not a problem.  For larger POSTs that contain a big
>>     attachment,
>>     it would be.  Ultimately, I only care about the string key/value
>>     pairs
>>     for my purposes (not file attachments) so I was hoping to discard
>>     attachment data while parsing the body.
>>
>>
>>
>     Thank you for the feedback, I will add it as to a future feature
>     list.
>
>>      I think that that is actually how Lua’s version of this works too.
>>      So my next thought was that I could use a Buffer and rs.readSync to
>>     read the POST body in buffer frames to keep memory minimal so that I
>>     could could discard the any file attachments from the POST body and
>>     just evaluate the key/value data that uses simple strings.  But from
>>     what you’re saying, it sounds like there’s basically no difference
>>     between fs.readSync w/ a Buffer and rs.readFileSync in terms of
>>     actual
>>     memory use. So either way, with a large POST body, you’d be
>>     steamrolling the memory use in a single Nginx worker thread. When I
>>     had to deal with stuff like this in Lua, I’d just run
>>     collectgarbage()
>>     to clean up memory and it seemed to work fine.  But then I also
>>     wasn’t
>>     having to parse out the POST body myself in Lua either.
>>
>>     It’s possible that something else is going on other than that.
>>      qs.parse seems like it could get us into some trouble if the
>>     query_string that was passed was unusuall long too from what you’re
>>     saying about how memory is handled.
>>
>>
>     for qs.parse() there is a limit for a number of arguments, which
>     you can
>     specify.
>
>>
>>     None of the situations that I’m handling are for long running
>>     requests.  They’re all designed for very fast requests that come
>>     into
>>     the servers that I manage on a constant basis.
>>
>>     If you can shed some light on the way that VM’s and their memory are
>>     handled per my question above and any insights into what to do about
>>     this type of situation, that would help a lot.  I don’t know if
>>     there
>>     are any plans to offer a POST body parsing feature in NJS for those
>>     that need to evalute POST body data like how Lua did it, but if
>>     there
>>     was some way to be able to do that at the Nginx layer instead of at
>>     the NJS layer, it seems like that could be a lot more sensitive to
>>     memory use.  Right now, if my understanding is correct, the only
>>     option that I’d even have would be to just stop doing POST body
>>     handling if the POST body is above a certain total size.  I guess if
>>     there was some way to forcibly free memory, that would help too.
>>      But
>>     I don’t think that that is as common of a problem as having to deal
>>     with very large query strings that some third party appends to a URL
>>     (probably maliciously) and/or a very large file upload attached to a
>>     multipart POST.  So the only concern that I’d have about memory in a
>>     situation where I don’t have to worry about memory when parsing a
>>     larger file woudl be if multiple js_sets and such would just keep
>>     spawning VMs and accumulating memory during a single request.
>>
>>     Any thoughts?
>>
>>     —
>>     Lance Dockins
>>
>>
>>     On Thursday, Sep 21, 2023 at 1:45 AM, Dmitry Volyntsev
>>     <xeioex at nginx.com> wrote:
>>
>>     On 20.09.2023 20:37, Lance Dockins wrote:
>>>     So I guess my question at the moment is whether endless memory use
>>>     growth being reported by njs.memoryStats.size after file writes is
>>>     some sort of false positive tied to quirks in how memory use is
>>>     being
>>>     reported or whether this is indicative of a memory leak?  Any
>>>     insight
>>>     would be appreicated.
>>
>>     Hi Lance,
>>     The reason njs.memoryStats.size keeps growing is because NJS uses
>>     arena
>>     memory allocator linked to a current request and a new object
>>     representing memoryStats structure is returned every time
>>     njs.memoryStats is accessed. Currently NJS does not free most of the
>>     internal objects and structures until the current request is
>>     destroyed
>>     because it is not intended for a long running code.
>>
>>     Regarding the sudden memory spikes, please share some details
>>     about JS
>>     code you are using.
>>     One place to look is to analyze the amount of traffic that goes to
>>     NJS
>>     locations and what exactly those location do.
>>