rewrite question

shiz nginx-forum at forum.nginx.org
Mon Jun 11 13:42:12 UTC 2018


I see another poster have written this, and deleted it afterwards.

`This is almost certainly not Google as they obey robots.txt. The & to
&
conversion is another sign of a poor quality crawler. Check the RDNS and
you will find it's probably some IP faking Google UA, I suggest blocking at
network level.`

My actual reply:


1 - It is Google
2 - They do not always a user friendly user agent.  That is a fact.
3 - When they don't, they also don't follow robots.txt.

So my problem remains.

I don't want to block those IP ranges at iptables level because it's Google.
So a rewrite or redirect - I'm not sure exactly which ATM is badly needed. 
Depends on the URL.

Here are the IP ranges, definetely Google.  Referenced in
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/issues/175

And here is a copy of my original message.

"Hi,

I'm still faithful to your script. It does great things to my websites.
Thanks for that.

Not a bug properly speaking, just a constatation you might like,

Recently, 1-2 months in time, I got a lot of strange impossible requests all
with the same User-Agent, no referrer and HTTP/1.1. All came from Google.
They do not respect robots.txt and sniff everywhere they're not supposed to.
I thought you should be make aware of it.

I know you whitelist Google IPs, but after inspection from other users, you
might want to revisit those.

User-agent:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/28.0.1500.71 Safari/537.36"

Ranges:
66.249.64.0/19
72.14.199.0/24

Examples of request:
72.14.199.18 - - [27/May/2018:14:12:01 -0700] "GET
/page.php?page%3Dabout_himeji_forklifts&amp HTTP/1.1" 301 178 "-"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/28.0.1500.71 Safari/537.36"
72.14.199.4 - - [27/May/2018:14:12:24 -0700] "GET
/page.php?page%3Dabout_himeji_forklifts&amp HTTP/1.1" 302 165 "-"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/28.0.1500.71 Safari/537.36"

In the meantime, I circumvented your whitelist by issuing manual range bans.
After 6 weeks, no more of those strange requests, and bandwidth has dropped
significantly since those 2 ranges were requestings quite a few hundred of
megabytes each day!

Thanks again."

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,280093,280117#msg-280117



More information about the nginx mailing list