[ANNOUNCE] gunzip filter module 0.3

Maxim Dounin mdounin at mdounin.ru
Tue Apr 20 06:15:46 MSD 2010


Hello!

On Mon, Apr 19, 2010 at 05:15:16PM -0400, theromis1 wrote:

> Perfect Max,
> 
> understood your style of module, right now I'm working hard to 
> deploy it just with small hacks.
> 
> Actually we don't need to do unzipping always, we need unzip 
> only for 200 upstream responses and only for text/html answers 
> for reducing load on server. Looks like better to have 
> coordination with your way of development, so I need small 
> instructions how better to do it, and I'll send my patch for it.
> 
> --- /home/roman/work/ngx_http_gunzip_filter_module-0.3/ngx_http_gunzip_filter_module.c	2010-03-22 11:11:16.000000000 -0700
> +++ ngx_http_gunzip_filter_module.c	2010-04-16 16:37:01.000000000 -0700
> @@ -132,6 +132,7 @@
>      if (!conf->enable
>          || r->headers_out.content_encoding == NULL
>          || r->headers_out.content_encoding->value.len != 4
> +        || r->upstream->state->status != 200

This is obviously wrong.

1. Nobody promised r->upstream is here.  Expect coredumps on 
static requests and/or internal error responses.

2. Unzipping only responses with status 200 isn't going to work as 
long as client doesn't support gzip at all.

If your module happens to process only 200 responses - well, it 
should be considered to be "module request" and coded as such.  
Alternatively there may be some settings to request "gunzip 
always" only for particular responses, but I tend to think it's 
overkill.

>          || ngx_strncasecmp(r->headers_out.content_encoding->value.data,
>                             (u_char *) "gzip", 4) != 0)
>      {
> @@ -142,6 +143,9 @@
>  
>      r->gzip_vary = 1;
>  
> +    r->gzip_tested = 1;
> +    r->gzip_ok = 1;
> +

No, you shouldn't modify nginx idea if client supports gzip.  
Instead, you should bypass the whole detection logic if you need 
to gunzip regardless of client's support.

And you code suggests that further tests will assume client 
supports gzip, while some don't.  This may lead to wierd results 
if you have gzip filter enabled.

>      if (!r->gzip_tested) {
>          if (ngx_http_gzip_ok(r) == NGX_OK) {
>              return ngx_http_next_header_filter(r);
> @@ -315,7 +319,7 @@
>      ctx->zstream.opaque = ctx;
>  
>      /* windowBits +16 to decode gzip, zlib 1.2.0.4+ */
> -    rc = inflateInit2(&ctx->zstream, MAX_WBITS + 16);
> +    rc = inflateInit2(&ctx->zstream, MAX_WBITS + 32); // yahoo looks weird with previous init

+32 means decode zlib stream, which isn't what expected with gzip 
content-encoding; it's content-encoding deflate.  And there are 
differencies.

>  
>      if (rc != Z_OK) {
>          ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
> 
> If not apply r->upstream->state->status != 200 in headers 
> processing I'm getting a lot of errors in log, one of it is 
> http://yandex.ru/yandsearch?text=sunken , which sends 302 
> redirect url with gzipped content, I've tried to fix it, but 
> found just error in zlib, when I've stored dumped data and used 
> 'gzip -d' on it all decompressed fine, and I've got normal HTML. 
> How better to debug it? What advice you can give me?

They return incorrect data in reply:

00000000  1f 8b 08 00 00 00 00 00  00 03 02 00 00 00 ff ff  |................|
00000010  1f 8b 08 00 00 00 00 00  00 03 2d 8e bb 0e 82 40  |..........-....@|
00000020  10 45 7b be 62 a4 b0 d3  51 28 1d d6 44 c1 68 e2  |.E{.b...Q(..D.h.|
00000030  ab 58 0b cb 95 1d b3 46  58 08 2c 46 fe 5e 1e 76  |.X.....FX.,F.^.v|
00000040  33 73 ee e4 5c 9a c4 97  ad bc 5f 13 d8 cb d3 11  |3s..\....._.....|
00000050  ae b7 cd f1 b0 05 7f 86  78 48 e4 0e 31 96 f1 48  |........xH..1..H|
00000060  82 f9 02 31 39 fb c2 23  e3 f2 4c 90 61 a5 bb c5  |...19..#..L.a...|
00000070  bd 5c c6 22 5c 04 b0 2b  1a ab 09 c7 83 47 38 04  |.\."\..+.....G8.|
00000080  e8 51 e8 b6 ff 59 8a 3f  ef 26 8f 4a 21 0d 83 2e  |.Q...Y.?.&.J!...|
00000090  d2 26 67 eb c0 a8 1a f2  e2 c3 1a 48 81 a9 f8 19  |.&g........H....|
000000a0  f9 d8 2a ab 6b 56 55 6a  d6 8e bf 2e aa 1b fb 66  |..*.kVUj.......f|
000000b0  3b 55 79 b9 ca aa 28 58  86 be 30 5c 31 a1 12 73  |;Uy...(X..0\1..s|
000000c0  c2 b2 37 0e ae ce d0 f7  f3 7e 75 a4 7e 57 da 00  |..7......~u.~W..|
000000d0  00 00                                             |..|
000000d2

First 16 bytes are incomplete/broken gzip member.  Correct one is 
at offset 0x10 (and it indeed may be decoded to valid html).

It's intresting how they achieved this.  Hey, anybody from Yandex 
here?  Comments?

Maxim Dounin



More information about the nginx mailing list