nginx modules and multiple escape_uri / unescape_uri definitions

Markus Linnala maage at
Mon Nov 7 19:28:07 UTC 2011

I ran into some problems with uri encoding. Problem is multiple and 
different implementations of escaping and unescaping uri. And 
because different programming language libraries use different ways 
of encoding.


These seems to be the same with each other. They differ from core 
one by unescapeing '+' to ' '. I guess nginx conforms RFC 3986 and 
external modules tries to be compatible with other programs like 
PHP, .NET, Java.

PHP encodes ' ' to '+' with urlencode

.NET Framework 4 encode ' ' to '+' with HttpUtility.UrlEncode

Java 5-7 at least encode ' ' to '+'

There is way to consolidate of unescape_uri. Add new type and then 
add version checks on modules and use core version with proper type. 
And extend modules to handle different types. Patch for nginx 
attached. 0001-application-x-www-form-urlencoded-compatible-mode.patch


I guess there was need for different implementations, but it might 
be possible to consolidate external modules after this:

These seems to be be the same. They differ from core somewhat. Core 
version of uri_component almost the same as uri on modules 
(!$*(),@`). Also args differ slightly (;&).

Could it be possible for set-misc and lua modules to use nginx core 
version of uri_component and args?

This is almost the same as nginx core version of uri_component. 
Couple of differences ( *~) and hex is uppercase. Commit message 
hints that new encoding was needed for java.

I guess this is for special need and not needed to consider further.

Markus Linnala, Chief Systems Architect
Cybercom Finland
Pakkahuoneenaukio 2 A; 33100 Tampere
Mobile +358 40 5919 735
Markus.Linnala at |

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-application-x-www-form-urlencoded-compatible-mode.patch
URL: <>

More information about the nginx-devel mailing list