rewrite question

Richard Stanway r1ch+nginx at teamliquid.net
Mon Jun 11 15:37:27 UTC 2018


That IP resolves to rate-limited-proxy-72-14-199-18.google.com - this is
not the Google search crawler, hence why it ignores your robots.txt. No one
seems to know for sure what the rate-limited-proxy IPs are used for. They
could represent random Chrome users using the Google data saving feature,
hence the varying user-agents you will see. Either way, they are probably
best not blocked, as they could represent many end user IPs. Maybe there is
an X-Forwarded-For header you could look at.

The Google search crawler will resolve to an IP like
crawl-66-249-64-213.googlebot.com.



On Mon, Jun 11, 2018 at 5:05 PM Francis Daly <francis at daoine.org> wrote:

> On Thu, Jun 07, 2018 at 07:57:43PM -0400, shiz wrote:
>
> Hi there,
>
> > Recently, Google has started spidering my website and in addition to
> normal
> > pages, appended "&amp" to all urls, even the pages excluded by robots.txt
> >
> > e.g.  page.php?page=aaa -> page.php?page=aaa&amp
> >
> > Any idea how to redirect/rewrite this?
>
> Untested, but:
>
>   if ($args ~ "&amp$") { return 400; }
>
> should handle all requests that end in the four characters you report.
>
> You may prefer a different response code.
>
> Good luck with it,
>
>         f
> --
> Francis Daly        francis at daoine.org
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20180611/5335d3e5/attachment.html>


More information about the nginx mailing list