problem with PCRE matching, utf-8, Greek, rewrite

Weibin Yao nbubingo at gmail.com
Fri Jul 2 06:12:13 MSD 2010


tmanolat at 2010-7-1 23:33 wrote:
> Dear all,
> I try to implement some rewrites using regular expressions and my URIs
> will contain Greek characters.
>
> Trials of the REs are going ok when tested with pcretest:
>
> [code]
> [root at localhost ~]# pcretest
> PCRE version 8.10 2010-06-25
>
>   re> #^[\x{0386}-\x{03FF}]+$#8
> data> bv
> No match
> data> Τηλέ
>  0: \x{3a4}\x{3b7}\x{3bb}\x{3ad}
>
> [/code]
> note the 8 modifier that actually tells PCRE to do a UTF-8 matching.
>
>
> Having the RE in nginx.config complains about 
> [code]
> [emerg]: pcre_compile() failed: character value in \x{...} sequence is
> too large in 
> [/code]
> which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
> option flag
>
> Am I right? How can I implement these Greek character URL rewrites?
>   
I use the raw bits in Chinese character substitution  in my subscitution 
module(http://code.google.com/p/substitutions4nginx/wiki/ChineseCharacterSubsitution)

I think you could convert the Greek cahracter like this:

'\x3a\x43\xb7\x3b\xb3\xad'

> The system environment is:
>
> * CentOS 5.4
> * PCRE 8.10 with utf-8 and utf-properties enabled 
> * nginx 0.8.42
>
>
> Cheers
> Tilemahos
>
> Posted at Nginx Forum: http://forum.nginx.org/read.php?2,104357,104357#msg-104357
>
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://nginx.org/mailman/listinfo/nginx
>   


-- 
Weibin Yao




More information about the nginx mailing list