nginx modules and multiple escape_uri / unescape_uri definitions
Markus Linnala
maage at hard.ware.fi
Mon Nov 7 19:28:07 UTC 2011
I ran into some problems with uri encoding. Problem is multiple and
different implementations of escaping and unescaping uri. And
because different programming language libraries use different ways
of encoding.
unescape_uri:
https://github.com/phusion/nginx/blob/master/src/core/ngx_string.c#L1336
These seems to be the same with each other. They differ from core
one by unescapeing '+' to ' '. I guess nginx conforms RFC 3986 and
external modules tries to be compatible with other programs like
PHP, .NET, Java.
https://github.com/agentzh/set-misc-nginx-module/blob/master/src/ngx_http_set_unescape_uri.c#L46
https://github.com/chaoslawful/lua-nginx-module/blob/master/src/ngx_http_lua_util.c#L1328
PHP encodes ' ' to '+' with urlencode
http://php.net/manual/en/function.urlencode.php
.NET Framework 4 encode ' ' to '+' with HttpUtility.UrlEncode
http://msdn.microsoft.com/en-us/library/4fkewx0t.aspx
Java 5-7 at least encode ' ' to '+'
http://download.oracle.com/javase/7/docs/api/java/net/URLEncoder.html
There is way to consolidate of unescape_uri. Add new type and then
add version checks on modules and use core version with proper type.
And extend modules to handle different types. Patch for nginx
attached. 0001-application-x-www-form-urlencoded-compatible-mode.patch
escape_uri:
I guess there was need for different implementations, but it might
be possible to consolidate external modules after this:
http://trac.nginx.org/nginx/changeset/4193/nginx
https://github.com/phusion/nginx/blob/master/src/core/ngx_string.c#L1505
These seems to be be the same. They differ from core somewhat. Core
version of uri_component almost the same as uri on modules
(!$*(),@`). Also args differ slightly (;&).
https://github.com/agentzh/set-misc-nginx-module/blob/master/src/ngx_http_set_escape_uri.c#L57
https://github.com/chaoslawful/lua-nginx-module/blob/master/src/ngx_http_lua_util.c#L1179
Could it be possible for set-misc and lua modules to use nginx core
version of uri_component and args?
This is almost the same as nginx core version of uri_component.
Couple of differences ( *~) and hex is uppercase. Commit message
hints that new encoding was needed for java.
https://github.com/yaoweibin/memc-nginx-module/blob/master/src/ngx_http_memc_request.c#L8
I guess this is for special need and not needed to consider further.
--
Markus Linnala, Chief Systems Architect
Cybercom Finland
Pakkahuoneenaukio 2 A; 33100 Tampere
Mobile +358 40 5919 735
Markus.Linnala at cybercom.com
www.cybercom.fi | www.cybercom.com
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-application-x-www-form-urlencoded-compatible-mode.patch
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20111107/a411f95b/attachment.ksh>
More information about the nginx-devel
mailing list