Debugging Nginx Memory Spikes on Production Servers
Dmitry Volyntsev
xeioex at nginx.com
Fri Sep 22 00:47:06 UTC 2023
On 9/21/23 4:41 PM, Lance Dockins wrote:
> That’s good info. Thank you.
>
> I have been doing some additional testing since my email last night
> and I have seen enough evidence to believe that file I/O in NJS is
> basically the source of the memory issues. I did some testing with
> very basic commands like readFileSync and Buffer + readSync and in all
> cases, the memory footprint when doing file handling in NJS is massive.
>
> Just doing this:
>
> let content = fs.readFileSync(path/to//file);
> let parts = content.split(boundary);
>
> Resulted in memory use that was close to a minimum of 4-8x the size of
> the file during my testing. We do have an upper bound on files that
> can be uploaded and that does contain this somwhat but it’s not hard
> for a larger request that is 99% file attachment to use exhorbitant
> amounts of memory.
>
>
>
Regarding the task at hand, do you check for Content-Type of the POST
body? So you can exclude anything except probably
application/x-www-form-urlencoded. At least what I see in lua: the
handler is only looking for application/x-www-form-urlencoded and not
multipart/form-data.
https://github.com/openresty/lua-nginx-module/blob/c89469e920713d17d703a5f3736c9335edac22bf/src/ngx_http_lua_args.c#L171
> I actually tried doing a Buffer + readSync variation on the same
> thing and the memory footprint was actually FAR FAR worse when I did that.
>
>
>
As of now, the resulting memory consumption will depend heavily on the
boundary.
In worst case, for 1mb of memory file that is split into 1 character
array, You will get ~16x memory consumed, because every 1 byte character
will be put into a njs_value_t structure.
With larger chunks the situation will be less extreme. Right now we are
implementing a way to deduplicate identical strings, this may help in
some situations.
> The 4-8x minimum memory commit seems like a problem to me just
> generally. But the fact that readSync doesn’t seem to be any better
> on memory (much worse actually) basically means that NJS is only safe
> to use for processing smaller files (or POST bodies) right now.
> There’s just no good way to keep data that you don’t care about in a
> file from occupying excessive amounts of memory that can’t be
> reclaimed. If there is no way to improve the memory footprint when
> handling files (or big strings), no memory conservative way to stream
> a file through some sort of buffer, and no first-party utility for
> providing parsed POST bodies right now,
> then it might be worth the time to put some notes in the NJS docs that
> the fs module may not be appropriate for larger files (e.g. files over
> 1mb).
>
> For what it’s worth, I’d also love to see some examples of how to
> properly use fs.readSync in the NJS examples docs. There really
> wasn’t much out there for that for NJS (or even in a lot of the Node
> docs) so I can’t say that my specific test implementation for that was
> ideal. But that’s just above and beyond the basic problems that I’m
> seeing with memory use with any form of file I/O at all (since the
> memory problems seem to be persistent whether doing reads or even log
> writes).
>
> —
> Lance Dockins
>
>
> On Thursday, Sep 21, 2023 at 5:01 PM, Dmitry Volyntsev
> <xeioex at nginx.com> wrote:
>
> On 9/21/23 6:50 AM, Lance Dockins wrote:
>
> Hi Lance,
>
> See my comments below.
>
>> Thanky you, Dmitry.
>>
>> One question before I describe what we are doing with NJS. I did
>> read
>> about the VM handling process before switching from Lua to NJS
>> and it
>> sounded very practical but my current understanding is that there
>> could be multiple VM’s instantiated for a single request. A js_set,
>> js_content, and js_header_filter directive that applies to a single
>> request, for example, would instantiate 3 VMs. And were you to need
>> to set multiple variables with js_set, then keep adding to that #
>> of VMs.
>>
>>
> This is not correct. For js_set, js_content and js_header_filter
> there
> is only a single VM.
> The internalRedirect() is the exception, because a VM does not
> survive
> it, but the previous VMs will not be freed until current request is
> finished. BTW, a VM instance itself is pretty small in size (~2kb)
> so it
> should not be a problem if you have a reasonable number of redirects.
>
>
>>
>> My original understanding of that was that those VMs would be
>> destroyed once they exited so even if you had multiple VMs
>> instantiated per request, the memory impact would not be
>> cumulative in
>> a single request. Is that understanding correct? Or are you saying
>> that each VM accumulates more and more memory until the entire
>> request
>> completes?
>>
>> As far as how we’re using NJS, we’re mostly using it for header
>> filters, internal redirection, and access control. So there really
>> shouldn’t be a threat to memory in most instances unless we’re not
>> just dealing with a single request memory leak inside of a VM but
>> also
>> a memory leak that involves every VM that NJS instantiates just
>> accumulating memory until the request completes.
>>
>> Right now, my working theory about what is most likely to be
>> creating
>> the memory spikes has to do with POST body analysis. Unfortunately,
>> some of the requests that I have to deal with are POSTs that have to
>> either be denied access or routed differently depending on the
>> contents of the POST body. Unfortunately, these same routes can
>> vary
>> in the size of the POST body and I have no control over how any of
>> that works because the way it works is controlled by third parties.
>> One of those third parties has significant market share on the
>> internet so we can’t really avoid dealing with it.
>>
>> In any case, before we switched to NJS, we were using Lua to do the
>> same things and that gave us the advantage of doing both memory
>> cleanup if needed and also doing easy analysis of POST body args. I
>> was able to do this sort of thing with Lua before:
>> local post_args, post_err = ngx.req.get_post_args()
>> if post_args.arg_name = something then
>>
>> But in NJS, there’s no such POST body utility so I had to write my
>> own. The code that I use to parse out the POST body works for both
>> URL encoded POST bodies and multipart POST bodies, but it has to
>> read
>> the entire POST into a variable before I can use it. For small
>> POSTs,
>> that’s not a problem. For larger POSTs that contain a big
>> attachment,
>> it would be. Ultimately, I only care about the string key/value
>> pairs
>> for my purposes (not file attachments) so I was hoping to discard
>> attachment data while parsing the body.
>>
>>
>>
> Thank you for the feedback, I will add it as to a future feature
> list.
>
>> I think that that is actually how Lua’s version of this works too.
>> So my next thought was that I could use a Buffer and rs.readSync to
>> read the POST body in buffer frames to keep memory minimal so that I
>> could could discard the any file attachments from the POST body and
>> just evaluate the key/value data that uses simple strings. But from
>> what you’re saying, it sounds like there’s basically no difference
>> between fs.readSync w/ a Buffer and rs.readFileSync in terms of
>> actual
>> memory use. So either way, with a large POST body, you’d be
>> steamrolling the memory use in a single Nginx worker thread. When I
>> had to deal with stuff like this in Lua, I’d just run
>> collectgarbage()
>> to clean up memory and it seemed to work fine. But then I also
>> wasn’t
>> having to parse out the POST body myself in Lua either.
>>
>> It’s possible that something else is going on other than that.
>> qs.parse seems like it could get us into some trouble if the
>> query_string that was passed was unusuall long too from what you’re
>> saying about how memory is handled.
>>
>>
> for qs.parse() there is a limit for a number of arguments, which
> you can
> specify.
>
>>
>> None of the situations that I’m handling are for long running
>> requests. They’re all designed for very fast requests that come
>> into
>> the servers that I manage on a constant basis.
>>
>> If you can shed some light on the way that VM’s and their memory are
>> handled per my question above and any insights into what to do about
>> this type of situation, that would help a lot. I don’t know if
>> there
>> are any plans to offer a POST body parsing feature in NJS for those
>> that need to evalute POST body data like how Lua did it, but if
>> there
>> was some way to be able to do that at the Nginx layer instead of at
>> the NJS layer, it seems like that could be a lot more sensitive to
>> memory use. Right now, if my understanding is correct, the only
>> option that I’d even have would be to just stop doing POST body
>> handling if the POST body is above a certain total size. I guess if
>> there was some way to forcibly free memory, that would help too.
>> But
>> I don’t think that that is as common of a problem as having to deal
>> with very large query strings that some third party appends to a URL
>> (probably maliciously) and/or a very large file upload attached to a
>> multipart POST. So the only concern that I’d have about memory in a
>> situation where I don’t have to worry about memory when parsing a
>> larger file woudl be if multiple js_sets and such would just keep
>> spawning VMs and accumulating memory during a single request.
>>
>> Any thoughts?
>>
>> —
>> Lance Dockins
>>
>>
>> On Thursday, Sep 21, 2023 at 1:45 AM, Dmitry Volyntsev
>> <xeioex at nginx.com> wrote:
>>
>> On 20.09.2023 20:37, Lance Dockins wrote:
>>> So I guess my question at the moment is whether endless memory use
>>> growth being reported by njs.memoryStats.size after file writes is
>>> some sort of false positive tied to quirks in how memory use is
>>> being
>>> reported or whether this is indicative of a memory leak? Any
>>> insight
>>> would be appreicated.
>>
>> Hi Lance,
>> The reason njs.memoryStats.size keeps growing is because NJS uses
>> arena
>> memory allocator linked to a current request and a new object
>> representing memoryStats structure is returned every time
>> njs.memoryStats is accessed. Currently NJS does not free most of the
>> internal objects and structures until the current request is
>> destroyed
>> because it is not intended for a long running code.
>>
>> Regarding the sudden memory spikes, please share some details
>> about JS
>> code you are using.
>> One place to look is to analyze the amount of traffic that goes to
>> NJS
>> locations and what exactly those location do.
>>
More information about the nginx
mailing list