Regex match the middle of a URL and also the ending?

Maxim Dounin mdounin at mdounin.ru
Sun Jun 4 00:09:00 UTC 2023


Hello!

On Sun, Jun 04, 2023 at 07:30:40AM +1000, Jore wrote:

> Hi there,
> 
> Thanks for getting back.
> 
> On 4/6/23 3:16 am, Maxim Dounin wrote:
> 
> > Hello!
> 
> […]
> 
> > The "^~" location modifier is for prefix-match locations to prevent 
> > further checking of regular expressions, see 
> > http://nginx.org/r/location for details. If you want to use a regular 
> > expression, you have to use the "~" modifier instead.
> 
> Thank you for that. Apologies, I should’ve mentioned that I did review 
> that documentation on how nginx selects a location. Unfortunately I 
> didn’t find it particularly clear or helpful.
> 
> I especially thought this rule in question would match and take 
> precedence over the latter /browser rule, because of this line on that page:
> 
>     "If the longest matching prefix location has the “^~” modifier then
>     regular expressions are not checked."
> 
> i.e. because this rule in question comes first and it is longer than the 
> latter /browser rule, a match would occur here and not later (because 
> processing stops here)?

The most important part is in the following paragraph:

  A location can either be defined by a prefix string, or by a 
  regular expression. Regular expressions are specified with the 
  preceding “~*” modifier (for case-insensitive matching), or the 
  “~” modifier (for case-sensitive matching). To find location 
  matching a given request, nginx first checks locations defined 
  using the prefix strings (prefix locations). Among them, the 
  location with the longest matching prefix is selected and 
  remembered. Then regular expressions are checked, in the order of 
  their appearance in the configuration file. The search of regular 
  expressions terminates on the first match, and the corresponding 
  configuration is used. If no match with a regular expression is 
  found then the configuration of the prefix location remembered 
  earlier is used.

In other words:

- Regular expressions are with "~*" and "~" modifiers.  Everything 
  else are prefix strings.

- For prefix strings, longest matching prefix is used (note that 
  order of prefix locations is not important).

- If the longest prefix found does not disable regular expression 
  matching (with the "^~" modifier, as per the quote you've 
  provided), regular expressions are checked in order.

As long as a regular expression is matched, nginx will use the 
corresponding location.  If no regular expressions matched, nginx 
will use the longest matching prefix location.

The "location" directive description additionally provides some 
examples explaining how this all works.  Reading the 
https://nginx.org/en/docs/http/request_processing.html article 
might be also helpful.

> And because I couldn’t find much on how nginx handles regex, I ended up 
> checking this question/answer 
> <https://stackoverflow.com/questions/59846238> on Stackoverflow. It 
> cleared things up a little, but still made me wonder why my approach 
> didn’t work.
> 
> Nevertheless, your suggestions to remove the priority prefix |^~| for 
> the second rule fixed the problem, but I still wonder why my approach 
> didn’t work. ;)

In your configuration,

location ^~ "/browser/.*/welcome/welcome.html" { ... }

is a location defined by a prefix string.

It will work for requests with the given prefix, such as 
"/browser/.*/welcome/welcome.html" or 
"/browser/.*/welcome/welcome.html.foobar".  But since it is a 
prefix string, and not a regular expression, the ".*" characters 
do not have any special meaning, and matched literally.  That 
is, this location won't match requests to resources like 
"/browser/foo123/welcome/welcome.html", since these use a 
different prefix.

To make it match requests to 
"/browser/foo123/welcome/welcome.html", you have to change the 
location to a location defined by a regular expression.  That, you 
have to change the "^~" modifier to "~" modifier (and it is also a 
good idea to change the regular expression to a slightly more 
explicit one, see my initial response).  But it is not enough, see 
below.

Similarly,

location ^~ /browser { ... }

is also a location defined by a prefix string.  Further, due to 
the "^~" modifier, it disables matching of regular expressions, so 
any request which starts with "/browser" won't be checked against 
regular expressions.  So you have to remove the "^~" modifier if 
you want nginx to check regular expressions, notably the one in 
the first location (assuming "^~" is changed to "~").

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list