[PATCH] Dynamic rate limiting for limit_req module

Mon Jan 2 23:47:15 UTC 2023

Hello!

On Mon, Jan 02, 2023 at 06:21:03AM +0000, J Carter wrote:

> > "The fundamental problem with dynamic rates in limit_req, which is 
> > a leaky bucket limiter, is that the rate is a property which 
> > applies to multiple requests as accumulated in the bucket (and 
> > this is basically why it is configured in the limit_req_zone, and 
> > not in limit_req), while the dynamically calculated rate is a 
> > property of the last request. 
> > 
> > This can easily result in counter-intuitive behaviour.
> > For example, consider 5 requests with 1 second interval, assuming 
> > burst 2, and rate being 1r/m or 100r/s depending on some request 
> > properties:
> > 
> > - 1st request, rate 1r/m, request is accepted
> > - 2st request, rate 1r/m, request is accepted
> > - 3rd request, rate 1r/m, request is rejected, since there is no 
> >   room in the bucket
> > - 4th request, rate 100r/s, request is accepted
> > - 5th request, rate 1r/m, request is accepted (unexpectedly)
> > 
> > Note that the 5th request is accepted, while it is mostly 
> > equivalent to the 3rd request, and one could expect it to be 
> > rejected.  But it happens to to be accepted because the 4th 
> > request "cleared" the bucket."
> 
> This is true - however I do wonder if the behavior (clearing the bucket) 
> is truly counter intuitive in that scenario.

Well, certainly it is.  And it is even more so, if you'll assume 
that 4th and 5th requests are processed essentially at the same 
time: it will be a race condition with different outcomes 
depending on the actual order of the requests during limit_req 
zone updates.  For additional fun, this order is not the same as 
order of the requests in logs.

> Should it not be expected that this user/key/state that has been 
> assigned 100r/s (even if just for one request) would have it's 
> outstanding excess requests recalculated (and in this case absolved) by 
> that rate increase?
> You have after all assigned it more capacity.
> 
> I'm not sure how to elegantly avoid this if it is an issue, since there 
> is no 'standard' rate to reference (it could interpolate over 
> time/requests, but that might be even more confusing).

I don't think there is a good solution, and that's one of the 
reasons why there is no such functionality in limit_req.  Instead, 
it provides a fixed rate, and an ability to provide different 
burst limits for different requests (which is checked on a 
per-request basis), as well as multiple different limits to cover 
cases when multiple rates are needed.

> > "Note well that the same limits now can be configured separately, 
> > in distinct shared memory zones.  This makes it possible to limit 
> > them requests consistently, without any unexpected deviations from 
> > the configured behaviour if requests with different rates 
> > interfere.
> > 
> > Just in case, the very same behaviour can be implemented with 
> > multiple limit_req_zones, with something like this:
> > 
> >     map $traffic_tier $limit_free {
> >         free          $binary_remote_addr;
> >     }
> > 
> >     map $traffic_tier $limit_basic {
> >         basic         $binary_remote_addr;
> >     }
> > 
> >     map $traffic_tier $limit_premium {
> >         premium       $binary_remote_addr;
> >     }
> > 
> >     limit_req_zone $limit_free zone=free:10m rate=1r/m;
> >     limit_req_zone $limit_basic zone=basic:10m rate=2r/m;
> >     limit_req_zone $limit_premium zone=premium:10m rate=1r/s;
> > 
> >     limit_req zone=free;
> >     limit_req zone=basic;
> >     limit_req zone=premium;
> > 
> > From the example it is not very clear how the suggested change is 
> > beneficial."
> 
> The issue with this approach (which is a similar set up to the one that 
> prompted this patch) is the scalability of the solution -  one zone is 
> required per unique rate, and the inability to assign arbitrary rates at 
> runtime.

Yet the solution is exactly equivalent to the example provided.

Further, given that the use case clearly do no match the intended 
usage of limit_req, which is designed to as a simple 
DoS-mitigation and bruteforce-mitigation solution, it looks pretty 
match trivial to implement.

> Perhaps a better example than the first one I gave to illustrate the 
> full utility would be with this scenario - involving a user sending a 
> JWT to determine the rate limit that is applied:
> 
> The limit_req_zone is assigned with $jwt_claim_sub as the key and 
> $jwt_claim_rate as the rate.
> 
> A user presents the following JWT with every request, his requests now 
> have a rate limit of 100r/s.
> {
>      ...
>      "sub": "user01",
>      "rate": "100r/s"
> }
> Mid-session, that user then acquires a new JWT to get increased speed -  
> the limit associated with this state/node is now 1000r/s with the same 
> key(sub). Note that these rate values do not need to be preset in the 
> nginx configuration, and it can all be done with a single limit_req_zone.
> {
>      ...
>      "sub": "user01",
>      "rate": "1000r/s"
> }
> Similar could also be done with key_val zone (ie. adding and modifying 
> rate strings at runtime within a key_val zone, and doing a lookup in the 
> key_val zone for that value based upon a user identifier or group as the 
> key).

This looks exactly equivalent to the configuration previously 
discussed.

Note well that actually having proper nginx configuration for all 
the allowed rate values ensure that no invalid and/or arbitrary 
rates can be supplied by users, providing an additional 
consistency check.

Also note that this is not how limit_req is expected to be used.  
As already mentioned above, limit_req is a DoS and bruteforce 
mitigation solution rather than a solution for billing 
enforcement.

[...]

> > The "lr->rate", which is saved in the shared memory, seems to be 
> > never used except during processing of the same request.  Any 
> > reasons to keep it in the shared memory, reducing the number of 
> > states which can be stored there?
> 
> A persistent record of the rate that was last assigned to a given 
> node/state is needed exclusively for expiring records (in the expire 
> function). I don't believe this can be avoided, as the rate of the 
> current request is unrelated to the 'to be removed' record's.

Ah, sorry, missed this.  So indeed, it needs to be explicitly 
stored in the shared zone then (if at all).

[...]

-- 
Maxim Dounin
http://mdounin.ru/