location regular expression not filtering some characters

CM Fields cmfileds at gmail.com
Tue Jun 19 18:05:10 UTC 2012

For those interested and to close this loop the question mark (?) is the query
string. I am not able to filter it, but you can use a rewrite rule to
clear it in nginx.

Another oddity is you can put a bunch of illegal characters behind the
question mark
and nginx will happily pass those to your back end server even though the
regex is in place. So, if you do not expect a "$" or "%" or any other
special character
in your back end you may be surprised.

I am still using this location regex
   location ~* ^/data/[\w.]+$  {...}

If we take this valid url

We can add a question mark and anything we want to after that and it will be
passed to your back end or script.
  http://example.com/data/1234.txt?some_text../../../%del table%

I am interested if this is an expected result. My concern is that the regex
I specified is being silently ignored. Should Nginx respect the user
and deny access to the URL with the question mark in it?

In most case I imagine the question mark and following text would be fine as the
link might contain helpful information. As far as I can tell most
resources online
say this is the expect behavior and it is up to the script to validate
the data.

I agree the script should check all input, but then why even bother with a
location regex to validate the url before it gets passed to a back end server?

On Tue, Jun 19, 2012 at 1:07 PM, CM Fields <cmfileds at gmail.com> wrote:
> I believe this was a mistake on my side. While testing I noticed that
> the (#) and (?)
> were allowed through but the URL result was not what I was expecting.
> When the pound (#) is used nginx converts the URI from
>   http://example.com/data/12#34.txt
>          and cuts off the pound sign and anything after it to this....
>   http://example.com/data/12
> before my regular expression is ever used. The pound (#) is a location specific
> tag so this expected and fine.
> The question mark (?) is still passed to my regular expression and
> allowed through.
>   http://example.com/data/12?34.txt
>       get passed through the regular expression unchanged
>   http://example.com/data/12?34.txt
> Not sure why the question mark is special yet.
> On Tue, Jun 19, 2012 at 11:46 AM, CM Fields <cmfileds at gmail.com> wrote:
>> I am looking to filter all characters other then those specified in the
>> "location" regular expression. For example, [\w.]+$ should only allow one or
>> more letters, numbers, underscore and period just like [a-zA-Z0-9_.]
>> location ~* ^/data/[\w.]+$  {...}
>> When I test the url with wget I find the pound (#) and question mark (?) are
>> allowed through. For example...
>> This URL is valid and is allowed through
>>   wget "http://example.com/data/1234.txt"
>> This URL with the additional "#" should not be allowed thorugh, but it is.
>>   wget "http://example.com/data/12#34.txt"
>> Adding a question mark also gets through when it is supposed to be blocked like
>> the pound "#" above.
>>   wget "http://example.com/data/12?34.txt"
>> Are pound (#) and questions mark (?) matches being overridden in Nginx and thus
>> getting past my regular expression?
>> Does anyone know of a way to block the "#" or "?" that I am missing?
>> Just for clarity, I have no need for the "#" or "?" in my script and I can do
>> checks in the script to exclude these characters if necessary.

More information about the nginx mailing list