serve content from a wget mirror

Steve Wilson lists-nginx at swsystem.co.uk
Tue Nov 25 19:13:49 UTC 2014


On 24/11/2014 12:36, Francis Daly wrote:
> On Mon, Nov 24, 2014 at 11:05:10AM +0000, Steve Wilson wrote:
> 
> Hi there,
> 
>> I'm trying to create an archive based on a current site which is due to
>> be taken down, I've used "wget -m" to mirror the site and all seems well
>> except I'm having trouble with what I think are arguments in the url.
>>
>> Most everything seems to work ok and this is currently my only issue. I
>> have on disk the file /dir/page.php?a=1&b=2 for example but nginx
>> returns a 404 when accessing http://localhost/dir/page.php?a=1&b=2,
> 
> url-space and filename-space are different.
> 
> Option 1: in all of the html that contains links that refer
> to /dir/page.php?a=1&b=2, url-encode it to be (at least)
> /dir/page.php%3Fa=1&b=2 (that is: ? becomes %3F).
> 
> With that, you need nothing special from nginx.
> 
> Option 2: use your try_files thing but use $uri?$args as the first
> argument (note the ? in there).
> 
> With that, you need the try_files magic to handle all requests. And
> if you happen to have both "page.php" and "page.php?", you may have
> difficulty accessing the former.
> 
> Option 3: keep the mirror of "static" content; but reimplement the
> back-end for "dynamic" content.
> 
> That's a lot more work, but is really the only way to provide the
> full mirror.
> 
> But if your current "offline mirror" is complete enough and does not
> contain any html forms which POST or have combinations of options that
> have not been used already, it is probably unnecessary.
> 
> 	f
> 

Ah yes. Option 2 was what I was trying to do and had missed the ? out :(

I'm going to have a look at wget options to encode as option 1 so even
the try_files won't be needed.

Unfortunately option 3 isn't an option as the source of the mirror will
be turned off when I've got this historical mirror up and running.

Steve.



More information about the nginx mailing list