location regular expression not filtering some characters

Maxim Dounin mdounin at mdounin.ru
Tue Jun 19 18:01:16 UTC 2012


Hello!

On Tue, Jun 19, 2012 at 11:46:32AM -0400, CM Fields wrote:

> I am looking to filter all characters other then those specified in the
> "location" regular expression. For example, [\w.]+$ should only allow one or
> more letters, numbers, underscore and period just like [a-zA-Z0-9_.]
> 
> location ~* ^/data/[\w.]+$  {...}
> 
> When I test the url with wget I find the pound (#) and question mark (?) are
> allowed through. For example...
> 
> This URL is valid and is allowed through
>    wget "http://example.com/data/1234.txt"
> 
> This URL with the additional "#" should not be allowed thorugh, but it is.
>    wget "http://example.com/data/12#34.txt"
> 
> Adding a question mark also gets through when it is supposed to be blocked like
> the pound "#" above.
>    wget "http://example.com/data/12?34.txt"
> 
> 
> Are pound (#) and questions mark (?) matches being overridden in Nginx and thus
> getting past my regular expression?

They aren't part of data matched by locations.  The "#" character 
denotes frament identifier (and normally not sent to a http server 
at all), and the "?" character denotes query string start.

The "#" and "?" characters will be only seen by location matching 
if they are sent escaped, i.e. as a part of uri path component, 
not as a syntax construct.

> Does anyone know of a way to block the "#" or "?" that I am missing?

It's not clear what you are trying to block.  If you want to 
reject all requests with fragments and query strings, you probably 
want to use the "if" directive instead.

Maxim Dounin



More information about the nginx mailing list