Memory usage in nginx proxy setup and use of min_uses

Lucas Rolff lucas at lucasrolff.com
Wed May 19 18:44:45 UTC 2021


> If you nevertheless observe 500 being returned in practice, this might be the actual thing to focus on.

Even with sub 100 requests and 4 workers, I've experienced it multiple times, where simply because the number of cache keys got exceeded, it was throwing 500 internal server errors for new uncached requests for hours on end (The particular instance, I have about 300 expired keys per 5 minutes)

When it happens again, I'll obviously investigate further if it's not supposed to happen.

> an attacker can easily request the same resource several times, moving it to the "normal" category

Correct, an attacker can almost always find ways to do things if they want to, I've just yet to see them being "smart" enough to request the things multiple times.
Even if it's not an attacker, but a misconfigured application (That isn't directly managed by whoever manage the nginx server), if an application for example would pass through identifiers in the URI (imagine gclid or fbclid hashes) - these types of IDs are generally unique per visitor, query strings may differ, but we're only going to see that request once or twice in 99% of the cases where this happens. As a result of that we do not fill the disk because of min_uses, but we do fill the memory because it isn't cleared out before reaching the inactive option.

So at least in use-cases like that, we'd often be able to mitigate somewhat misconfigured applications - it's quite common within the CDN industry to see this issue anyway. While the ones running the CDN then obviously have to reach out to the customer and ask them to fix their application, it would be awesome to have a more proactive approach available, that would limit the importance of an urgent fix.

What I can hear is that you don't see the point of such feature, that's fine __

I guess the alternative is to use lua to hook into nginx for the cache metadata/shm (probably needs a custom nginx module as well since the shm isn't exposed in lua); Then one should be able to wipe out the keys that are useless that way.

Best Regards,
Lucas Rolff

On 18/05/2021, 03.27, "nginx on behalf of Maxim Dounin" <nginx-bounces at nginx.org on behalf of mdounin at mdounin.ru> wrote:

    Hello!

    On Mon, May 17, 2021 at 07:33:43PM +0000, Lucas Rolff wrote:

    > Hi Maxim!
    > 
    > > - The attack you are considering is not about "poisoning".  At 
    > > most, it can be used to make the cache less efficient.
    > 
    > Poisoning is probably the wrong word indeed, and since nginx 
    > doesn't really handle reaching the limit of keys_zone, it simply 
    > starts to return a 500 internal server error. So I don't think 
    > it's making the cache less efficient (Other than you won't be 
    > able to cache that much), you're ending up breaking nginx 
    > because when the keys_zone limit has been reached, nginx simply 
    > starts returning 500 internal server error for items that are 
    > not already in proxy_cache - if it would do an LRU/LFU on the 
    > keys - then yes, you could probably end up with a cache less 
    > efficient.

    While 500 is possible in some cases, especially in configurations 
    with many worker processes and high request concurrency, even in 
    the worst case it's expected to happen at most for half of the 
    requests, usually much less than that.  Further, cache manager 
    monitors the number of cache items in the keys_zone, cleaning 
    things in advance, making 500 almost impossible in practice.

    If you nevertheless observe 500 being returned in practice, this 
    might be the actual thing to focus on.

    [...]

    > Unless nginx very recently implemented that reaching keys_zone 
    > limit, will start purging old cache - then no, it would still 
    > break the nginx for non-cached requests (returning 500 internal 
    > server error). If nginx has started to purge old things if the 
    > limit is reached, then sure the attacker would still be able to 
    > wipe out the cache.

    Clearing old cache items when it is not possible to allocate a 
    cache node dates back to initial cache support in nginx 0.7.44[1].  
    And cache manager monitoring of the keys_zone and clearing it in 
    advance dates back to nginx 1.9.13 released about five years 
    ago[2].  Not sure any of these counts as "very recently".

    > But let's say we have an "inactive" set to 24+ hours (Which is 
    > often used for static files) - an attack where someone would 
    > append random query strings - those keys would first be removed 
    > after 24 hours (or higher, depending on the limit) - with a 
    > separate flag, one could set this counter to something like 60 
    > seconds (So delete the key from memory if the key haven't 
    > reached it's min_uses within 60 seconds) - this way, you're 
    > still rotating those keys out *a lot* faster.

    While this may be preferable for some use cases (and sounds close 
    to the "Segmented LRU" cache policy[3]), this certainly don't 
    protect from the attack you've initially described.  As previously 
    suggested, an attacker can easily request the same resource 
    several times, moving it to the "normal" category, so it will stay 
    in the cache for 24+ hours you've configured.  So instead this 
    distinction might make things worse, making it harder for actually 
    requested resources to get into cache.

    > > In particular, this can be done with limit_req
    > 
    > If we'd limit this to 20 req/s, this would allow a single IP to 
    > use up 1.78 million keys in the keys_zone if "inactive" is 24 
    > hours - do this with 10 IPs, we're at 17.8 million.

    The basic idea of burst-based limiting the limit_req module 
    implements is that you don't need to set high rates for IP 
    addresses.  Rather, you have to configure something you expect to 
    be seen on average per hour (or even day), and allow large enough 
    bursts.  So instead of limiting to 20 r/s you can limit to 1 r/m 
    with burst set to, say, 1000.

    [...]

    [1] http://hg.nginx.org/nginx/rev/3a8a53c0c42f#l19.478
    [2] http://hg.nginx.org/nginx/rev/c9d680b00744
    [3] https://en.wikipedia.org/wiki/Cache_replacement_policies#Segmented_LRU_(SLRU)

    -- 
    Maxim Dounin
    http://mdounin.ru/
    _______________________________________________
    nginx mailing list
    nginx at nginx.org
    http://mailman.nginx.org/mailman/listinfo/nginx




More information about the nginx mailing list