problem with PCRE matching, utf-8, Greek, rewrite

Maxim Dounin mdounin at mdounin.ru
Thu Jul 1 21:11:48 MSD 2010


Hello!

On Thu, Jul 01, 2010 at 11:33:49AM -0400, tmanolat wrote:

> Dear all,
> I try to implement some rewrites using regular expressions and my URIs
> will contain Greek characters.
> 
> Trials of the REs are going ok when tested with pcretest:
> 
> [code]
> [root at localhost ~]# pcretest
> PCRE version 8.10 2010-06-25
> 
>   re> #^[\x{0386}-\x{03FF}]+$#8
> data> bv
> No match
> data> Τηλέ
>  0: \x{3a4}\x{3b7}\x{3bb}\x{3ad}
> 
> [/code]
> note the 8 modifier that actually tells PCRE to do a UTF-8 matching.
> 
> 
> Having the RE in nginx.config complains about 
> [code]
> [emerg]: pcre_compile() failed: character value in \x{...} sequence is
> too large in 
> [/code]
> which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
> option flag
> 
> Am I right? How can I implement these Greek character URL rewrites?

Using (*UTF8) to switch pcre into utf-8 mode should work find in 
both nginx and pcretest.  See man pcresyntax for details.

Maxim Dounin



More information about the nginx mailing list