Regex match the middle of a URL and also the ending?
Maxim Dounin
mdounin at mdounin.ru
Sun Jun 4 00:09:00 UTC 2023
Hello!
On Sun, Jun 04, 2023 at 07:30:40AM +1000, Jore wrote:
> Hi there,
>
> Thanks for getting back.
>
> On 4/6/23 3:16 am, Maxim Dounin wrote:
>
> > Hello!
>
> […]
>
> > The "^~" location modifier is for prefix-match locations to prevent
> > further checking of regular expressions, see
> > http://nginx.org/r/location for details. If you want to use a regular
> > expression, you have to use the "~" modifier instead.
>
> Thank you for that. Apologies, I should’ve mentioned that I did review
> that documentation on how nginx selects a location. Unfortunately I
> didn’t find it particularly clear or helpful.
>
> I especially thought this rule in question would match and take
> precedence over the latter /browser rule, because of this line on that page:
>
> "If the longest matching prefix location has the “^~” modifier then
> regular expressions are not checked."
>
> i.e. because this rule in question comes first and it is longer than the
> latter /browser rule, a match would occur here and not later (because
> processing stops here)?
The most important part is in the following paragraph:
A location can either be defined by a prefix string, or by a
regular expression. Regular expressions are specified with the
preceding “~*” modifier (for case-insensitive matching), or the
“~” modifier (for case-sensitive matching). To find location
matching a given request, nginx first checks locations defined
using the prefix strings (prefix locations). Among them, the
location with the longest matching prefix is selected and
remembered. Then regular expressions are checked, in the order of
their appearance in the configuration file. The search of regular
expressions terminates on the first match, and the corresponding
configuration is used. If no match with a regular expression is
found then the configuration of the prefix location remembered
earlier is used.
In other words:
- Regular expressions are with "~*" and "~" modifiers. Everything
else are prefix strings.
- For prefix strings, longest matching prefix is used (note that
order of prefix locations is not important).
- If the longest prefix found does not disable regular expression
matching (with the "^~" modifier, as per the quote you've
provided), regular expressions are checked in order.
As long as a regular expression is matched, nginx will use the
corresponding location. If no regular expressions matched, nginx
will use the longest matching prefix location.
The "location" directive description additionally provides some
examples explaining how this all works. Reading the
https://nginx.org/en/docs/http/request_processing.html article
might be also helpful.
> And because I couldn’t find much on how nginx handles regex, I ended up
> checking this question/answer
> <https://stackoverflow.com/questions/59846238> on Stackoverflow. It
> cleared things up a little, but still made me wonder why my approach
> didn’t work.
>
> Nevertheless, your suggestions to remove the priority prefix |^~| for
> the second rule fixed the problem, but I still wonder why my approach
> didn’t work. ;)
In your configuration,
location ^~ "/browser/.*/welcome/welcome.html" { ... }
is a location defined by a prefix string.
It will work for requests with the given prefix, such as
"/browser/.*/welcome/welcome.html" or
"/browser/.*/welcome/welcome.html.foobar". But since it is a
prefix string, and not a regular expression, the ".*" characters
do not have any special meaning, and matched literally. That
is, this location won't match requests to resources like
"/browser/foo123/welcome/welcome.html", since these use a
different prefix.
To make it match requests to
"/browser/foo123/welcome/welcome.html", you have to change the
location to a location defined by a regular expression. That, you
have to change the "^~" modifier to "~" modifier (and it is also a
good idea to change the regular expression to a slightly more
explicit one, see my initial response). But it is not enough, see
below.
Similarly,
location ^~ /browser { ... }
is also a location defined by a prefix string. Further, due to
the "^~" modifier, it disables matching of regular expressions, so
any request which starts with "/browser" won't be checked against
regular expressions. So you have to remove the "^~" modifier if
you want nginx to check regular expressions, notably the one in
the first location (assuming "^~" is changed to "~").
Hope this helps.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list