Re: Как отбиться от 80legs.com ?

Peter B. Pokryshev ppb at valuehost.ru
Tue Oct 29 14:44:28 UTC 2013


On Tue, 29 Oct 2013 10:40:10 -0400
"Gaidamak" <nginx-forum at nginx.us> wrote:

> Повадилась такая вот напасть. 
> 
> http://www.80legs.com/webcrawler.html
> 
> Как ее грамотно выпилить? 
> 

Забанить по юзерагенту или как они на сайте сами пишут:

 If you'd like us to stop crawling your website, the best thing to do is to block our web crawler using the robots.txt specification. To do this, add the following to your robots.txt:

   User-agent: 008
   Disallow: /	
If you block 008 using robots.txt, you will see crawl requests die down gradually, rather than immediately. This happens because of our distributed architecture. Our computers only periodically receive robots.txt information for domains they are crawling.


> В логах много такого:
> 
> 109.166.134.39 - - [29/Oct/2013:18:34:09 +0400] site.domain.com "GET
> /page/url/  HTTP/1.1" 502 107 "-" "Mozilla/5.0 (compatible; 008/0.85;
> http://www.80legs.com/webcrawler.html) Gecko/2008032620" 0.000
> 
> Posted at Nginx Forum: http://forum.nginx.org/read.php?21,244236,244236#msg-244236
> 
> _______________________________________________
> nginx-ru mailing list
> nginx-ru at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-ru

-- 
Peter B. Pokryshev <ppb at valuehost.ru>



Подробная информация о списке рассылки nginx-ru