Memory usage in nginx proxy setup and use of min_uses

Tue May 18 01:27:02 UTC 2021

Hello!

On Mon, May 17, 2021 at 07:33:43PM +0000, Lucas Rolff wrote:

> Hi Maxim!
> 
> > - The attack you are considering is not about "poisoning".  At 
> > most, it can be used to make the cache less efficient.
> 
> Poisoning is probably the wrong word indeed, and since nginx 
> doesn't really handle reaching the limit of keys_zone, it simply 
> starts to return a 500 internal server error. So I don't think 
> it's making the cache less efficient (Other than you won't be 
> able to cache that much), you're ending up breaking nginx 
> because when the keys_zone limit has been reached, nginx simply 
> starts returning 500 internal server error for items that are 
> not already in proxy_cache - if it would do an LRU/LFU on the 
> keys - then yes, you could probably end up with a cache less 
> efficient.

While 500 is possible in some cases, especially in configurations 
with many worker processes and high request concurrency, even in 
the worst case it's expected to happen at most for half of the 
requests, usually much less than that.  Further, cache manager 
monitors the number of cache items in the keys_zone, cleaning 
things in advance, making 500 almost impossible in practice.

If you nevertheless observe 500 being returned in practice, this 
might be the actual thing to focus on.

[...]

> Unless nginx very recently implemented that reaching keys_zone 
> limit, will start purging old cache - then no, it would still 
> break the nginx for non-cached requests (returning 500 internal 
> server error). If nginx has started to purge old things if the 
> limit is reached, then sure the attacker would still be able to 
> wipe out the cache.

Clearing old cache items when it is not possible to allocate a 
cache node dates back to initial cache support in nginx 0.7.44[1].  
And cache manager monitoring of the keys_zone and clearing it in 
advance dates back to nginx 1.9.13 released about five years 
ago[2].  Not sure any of these counts as "very recently".

> But let's say we have an "inactive" set to 24+ hours (Which is 
> often used for static files) - an attack where someone would 
> append random query strings - those keys would first be removed 
> after 24 hours (or higher, depending on the limit) - with a 
> separate flag, one could set this counter to something like 60 
> seconds (So delete the key from memory if the key haven't 
> reached it's min_uses within 60 seconds) - this way, you're 
> still rotating those keys out *a lot* faster.

While this may be preferable for some use cases (and sounds close 
to the "Segmented LRU" cache policy[3]), this certainly don't 
protect from the attack you've initially described.  As previously 
suggested, an attacker can easily request the same resource 
several times, moving it to the "normal" category, so it will stay 
in the cache for 24+ hours you've configured.  So instead this 
distinction might make things worse, making it harder for actually 
requested resources to get into cache.

> > In particular, this can be done with limit_req
> 
> If we'd limit this to 20 req/s, this would allow a single IP to 
> use up 1.78 million keys in the keys_zone if "inactive" is 24 
> hours - do this with 10 IPs, we're at 17.8 million.

The basic idea of burst-based limiting the limit_req module 
implements is that you don't need to set high rates for IP 
addresses.  Rather, you have to configure something you expect to 
be seen on average per hour (or even day), and allow large enough 
bursts.  So instead of limiting to 20 r/s you can limit to 1 r/m 
with burst set to, say, 1000.

[...]

[1] http://hg.nginx.org/nginx/rev/3a8a53c0c42f#l19.478
[2] http://hg.nginx.org/nginx/rev/c9d680b00744
[3] https://en.wikipedia.org/wiki/Cache_replacement_policies#Segmented_LRU_(SLRU)

-- 
Maxim Dounin
http://mdounin.ru/