problem with PCRE matching, utf-8, Greek, rewrite
Maxim Dounin
mdounin at mdounin.ru
Thu Jul 1 21:11:48 MSD 2010
Hello!
On Thu, Jul 01, 2010 at 11:33:49AM -0400, tmanolat wrote:
> Dear all,
> I try to implement some rewrites using regular expressions and my URIs
> will contain Greek characters.
>
> Trials of the REs are going ok when tested with pcretest:
>
> [code]
> [root at localhost ~]# pcretest
> PCRE version 8.10 2010-06-25
>
> re> #^[\x{0386}-\x{03FF}]+$#8
> data> bv
> No match
> data> Τηλέ
> 0: \x{3a4}\x{3b7}\x{3bb}\x{3ad}
>
> [/code]
> note the 8 modifier that actually tells PCRE to do a UTF-8 matching.
>
>
> Having the RE in nginx.config complains about
> [code]
> [emerg]: pcre_compile() failed: character value in \x{...} sequence is
> too large in
> [/code]
> which I guess means that somehow nginx calls PCRE without the PCRE_UTF8
> option flag
>
> Am I right? How can I implement these Greek character URL rewrites?
Using (*UTF8) to switch pcre into utf-8 mode should work find in
both nginx and pcretest. See man pcresyntax for details.
Maxim Dounin
More information about the nginx
mailing list