nginx modules and multiple escape_uri / unescape_uri definitions

Markus Linnala maage at hard.ware.fi
Mon Nov 7 19:28:07 UTC 2011


I ran into some problems with uri encoding. Problem is multiple and 
different implementations of escaping and unescaping uri. And 
because different programming language libraries use different ways 
of encoding.



unescape_uri:

https://github.com/phusion/nginx/blob/master/src/core/ngx_string.c#L1336

These seems to be the same with each other. They differ from core 
one by unescapeing '+' to ' '. I guess nginx conforms RFC 3986 and 
external modules tries to be compatible with other programs like 
PHP, .NET, Java.

https://github.com/agentzh/set-misc-nginx-module/blob/master/src/ngx_http_set_unescape_uri.c#L46

https://github.com/chaoslawful/lua-nginx-module/blob/master/src/ngx_http_lua_util.c#L1328

PHP encodes ' ' to '+' with urlencode
http://php.net/manual/en/function.urlencode.php

.NET Framework 4 encode ' ' to '+' with HttpUtility.UrlEncode
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx

Java 5-7 at least encode ' ' to '+'
http://download.oracle.com/javase/7/docs/api/java/net/URLEncoder.html

There is way to consolidate of unescape_uri. Add new type and then 
add version checks on modules and use core version with proper type. 
And extend modules to handle different types. Patch for nginx 
attached. 0001-application-x-www-form-urlencoded-compatible-mode.patch




escape_uri:

I guess there was need for different implementations, but it might 
be possible to consolidate external modules after this:

http://trac.nginx.org/nginx/changeset/4193/nginx

https://github.com/phusion/nginx/blob/master/src/core/ngx_string.c#L1505

These seems to be be the same. They differ from core somewhat. Core 
version of uri_component almost the same as uri on modules 
(!$*(),@`). Also args differ slightly (;&).

https://github.com/agentzh/set-misc-nginx-module/blob/master/src/ngx_http_set_escape_uri.c#L57

https://github.com/chaoslawful/lua-nginx-module/blob/master/src/ngx_http_lua_util.c#L1179

Could it be possible for set-misc and lua modules to use nginx core 
version of uri_component and args?


This is almost the same as nginx core version of uri_component. 
Couple of differences ( *~) and hex is uppercase. Commit message 
hints that new encoding was needed for java.

https://github.com/yaoweibin/memc-nginx-module/blob/master/src/ngx_http_memc_request.c#L8

I guess this is for special need and not needed to consider further.

-- 
Markus Linnala, Chief Systems Architect
Cybercom Finland
Pakkahuoneenaukio 2 A; 33100 Tampere
Mobile +358 40 5919 735
Markus.Linnala at cybercom.com

www.cybercom.fi | www.cybercom.com

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-application-x-www-form-urlencoded-compatible-mode.patch
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20111107/a411f95b/attachment.ksh>


More information about the nginx-devel mailing list