proxy_pass is double-encoding some pre-encoded uri's

Joey Korkames lists at ruby-forum.com
Thu May 22 05:48:17 MSD 2008


Hello, just wanted to start by saying that nginx is my favorite server
for my personal projects - what an awesome piece of work. This is my
first bug/help request.

I've been using proxy_store as a "mirror on demand" for serving APT
packages to debian machines. Occasionally a package will have a tilde
("~") in the file name and the proxy_pass's GET to the upstream server
will fail.

Looking through nginx's debug logs and tcpdumps, it seems APT will make
the initial GET with the URI already encoded but the URL is encoded
again at the moment of proxy_pass making the GET request to the upstream
server.

My proxy_store config:

location /apt-cache/debian/lenny {
         root /var/www/spawn.llnw.com/htdocs/proxy_store;
         recursive_error_pages on;
         error_page 404 = /apt-fetch-easynews$request_uri;
}

location /apt-fetch-easynews {
          internal;
          rewrite /apt-fetch-easynews/apt-cache/([^/]*)/([^/]*)(.*)
/linux/debian$3 break;

          recursive_error_pages on;
          proxy_intercept_errors on;
          proxy_connect_timeout 6;
          proxy_read_timeout 20;
          proxy_next_upstream error timeout invalid_header http_500
http_503 http_404;
          proxy_pass http://debian.mirrors.easynews.com;

          proxy_store /var/www/default/htdocs/proxy_store/$request_uri;
          proxy_store_access user:rw group:rw all:r;

          error_page 404 503 504 = /apt-fetch-kernelorg$request_uri;
#failover to kernel.org
}

For URI:
http://localhost/apt-cache/debian/lenny/pool/main/b/binutils/binutils_2.18.17~cvs20080103-4+b1_amd64.deb

GET from client:

2008/05/22 01:32:42 [debug] 7400#0: *1 http request line: "GET
/apt-cache/debian/lenny/pool/main/b/binutils/binutils_2.18.1%7ecvs20080103-4+b1_amd64.deb
HTTP/1.1"
2008/05/22 01:32:42 [debug] 7400#0: *1 http uri:
"/apt-cache/debian/lenny/pool/main/b/binutils/binutils_2.18.1~cvs20080103-4+b1_amd64.deb"
2008/05/22 01:32:42 [debug] 7400#0: *1 http args: ""
2008/05/22 01:32:42 [debug] 7400#0: *1 http exten: "deb"
2008/05/22 01:32:42 [debug] 7400#0: *1 http process request header line
2008/05/22 01:32:42 [debug] 7400#0: *1 http header: "Host: localhost"
2008/05/22 01:32:42 [debug] 7400#0: *1 http header: "Connection:
keep-alive"
2008/05/22 01:32:42 [debug] 7400#0: *1 http header: "User-Agent: Debian
APT-HTTP/1.3 (0.7.11)"
2008/05/22 01:32:42 [debug] 7400#0: *1 http header done

....

GET to upstream server:
2008/05/22 01:32:42 [debug] 7400#0: *1 http proxy header: "User-Agent:
Debian APT-HTTP/1.3 (0.7.11)"
2008/05/22 01:32:42 [debug] 7400#0: *1 http proxy header:
"GET
/linux/debian/pool/main/b/binutils/binutils_2.18.1%257ecvs20080103-4+b1_amd64.deb
HTTP/1.0
Host: localhost
Connection: close
User-Agent: Debian APT-HTTP/1.3 (0.7.11)

"

...
404 from upstream server:
2008/05/22 01:32:43 [debug] 7400#0: epoll: fd:12 ev:0005
d:00002AAAAAAC5290
2008/05/22 01:32:43 [debug] 7400#0: *1 http upstream process header
2008/05/22 01:32:43 [debug] 7400#0: *1 malloc: 00000000006ABE90:4096
2008/05/22 01:32:43 [debug] 7400#0: *1 recv: fd:12 440 of 4096
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy status 404 "404 Not
Found"
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: "Date: Thu, 22
May 2008 01:32:42 GMT"
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: "Server:
Apache"
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header:
"Content-Length: 276"
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: "Connection:
close"
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header: "Content-Type:
text/html; charset=iso-8859-1"
2008/05/22 01:32:43 [debug] 7400#0: *1 http proxy header done
2008/05/22 01:32:43 [debug] 7400#0: *1 finalize http upstream request:
404
2008/05/22 01:32:43 [debug] 7400#0: *1 finalize http proxy request
2008/05/22 01:32:43 [debug] 7400#0: *1 free rr peer 1 0
2008/05/22 01:32:43 [debug] 7400#0: *1 close http upstream connection:
12
2008/05/22 01:32:43 [debug] 7400#0: *1 event timer del: 12:
1211419982557

The same transaction as seen through tcpdump:

01:36:36.253662 IP 127.0.0.1.60417 > 69.16.168.244.80: P 1:242(241) ack
1 win 92 <nop,nop,timestamp 1058195635 33813862>
E..%W]@. at ....o.+E......P....c|vV...\N......
?......fGET
/linux/debian/pool/main/b/binutils/binutils_2.18.1%257ecvs20080103-4+b1_amd64.deb
HTTP/1.0
Host: localhost
Connection: close
User-Agent: Debian APT-HTTP/1.3 (0.7.11)

01:36:36.324114 IP 69.16.168.244.80 > 127.0.0.1.60417: P 1:441(440) ack
242 win 54 <nop,nop,timestamp 33813870 1058195635>
E....?@.7.q-E....o.+.P..c|vV.......6.......
...n?...HTTP/1.1 404 Not Found
Date: Thu, 22 May 2008 01:36:36 GMT
Server: Apache
Content-Length: 276
Connection: close
Content-Type: text/html; charset=iso-8859-1

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL
/linux/debian/pool/main/b/binutils/binutils_2.18.1%7ecvs20080103-4+b1_amd64.deb
was not found on this server.</p>
</body></html>

If you take the uri and fix the double-encoding it by hand...
http://69.16.168.244/linux/debian/pool/main/b/binutils/binutils_2.18.1%257ecvs20080103-4+b1_amd64.deb
"%25" -> "%"
http://69.16.168.244/linux/debian/pool/main/b/binutils/binutils_2.18.1%7ecvs20080103-4+b1_amd64.deb
..the once-encoded uri works.

I realize this can be considered an apt-get bug, but some browsers out
there may pre-encode "unreserved" special characters in their uris
(http://www.ietf.org/rfc/rfc2396.txt see: sect 2.3) like apt-get is
doing.

Nginx does seem to know when to decode the original URI and save it in
decoded form in all of the logs - can this same logic be used by
proxy_pass to determine whether it should encode a GET request or not to
the upstream server?

joey
-- 
Posted via http://www.ruby-forum.com/.





More information about the nginx mailing list