problem with PCRE matching, utf-8, Greek, rewrite
Weibin Yao
nbubingo at gmail.com
Fri Jul 2 06:12:13 MSD 2010
tmanolat at 2010-7-1 23:33 wrote:
> Dear all,
> I try to implement some rewrites using regular expressions and my URIs
> will contain Greek characters.
>
> Trials of the REs are going ok when tested with pcretest:
>
> [code]
> [root at localhost ~]# pcretest
> PCRE version 8.10 2010-06-25
>
> re> #^[\x{0386}-\x{03FF}]+$#8
> data> bv
> No match
> data> Τηλέ
> 0: \x{3a4}\x{3b7}\x{3bb}\x{3ad}
>
> [/code]
> note the 8 modifier that actually tells PCRE to do a UTF-8 matching.
>
>
> Having the RE in nginx.config complains about
> [code]
> [emerg]: pcre_compile() failed: character value in \x{...} sequence is
> too large in
> [/code]
> which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
> option flag
>
> Am I right? How can I implement these Greek character URL rewrites?
>
I use the raw bits in Chinese character substitution in my subscitution
module(http://code.google.com/p/substitutions4nginx/wiki/ChineseCharacterSubsitution)
I think you could convert the Greek cahracter like this:
'\x3a\x43\xb7\x3b\xb3\xad'
> The system environment is:
>
> * CentOS 5.4
> * PCRE 8.10 with utf-8 and utf-properties enabled
> * nginx 0.8.42
>
>
> Cheers
> Tilemahos
>
> Posted at Nginx Forum: http://forum.nginx.org/read.php?2,104357,104357#msg-104357
>
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://nginx.org/mailman/listinfo/nginx
>
--
Weibin Yao
More information about the nginx
mailing list