Stoping bots and wget in nginx

Fabio Coatti cova at ferrara.linux.it
Wed Dec 19 11:00:32 MSK 2007


Alle martedì 18 dicembre 2007, Eden Li ha scritto:
> wget (and many other user agents) respect robots.txt if you place it
> at /robots.txt:
>
>   http://www.robotstxt.org/orig.html
>   http://en.wikipedia.org/wiki/Robots.txt
>
> Of course malicious agents will ignore it and continue scraping your
> site.  It's pretty hard to block these kinds of bots since they can
> mimic browser requests that would be difficult to disambiguate from
> normal user requests.


That's true. But if you look carefully to a usual web site logs, most part of 
weird urls are coming from a small subset of specific user agents (basically, 
scripts run by people who barely have a clue of what they are doing).
While I agree that several tools respects robots.txt, they are the "good" 
ones, and I see no point in stopping them. Othe other side, malicious tools 
that fakes the user agent are really difficult to stop and you have to rely 
on a good configuration of the system. In the middle lies a highly amount of 
hits coming from specific user agents, mostly trying to do pretty harmless 
things (bounce attacks, etc..). That kind of visitors can be kept out by a 
simple configuration line, and given the hig rate of them it can be worth to 
use that countermeasure (naive as it is)



>
> On 12/18/07, Fabio Coatti <cova at ferrara.linux.it> wrote:
> > Alle martedì 18 dicembre 2007, Alexis Torres Garnica ha scritto:
> > > Hi guys, I am new to the list. Is there a way to stop or block the bots
> > > access and wget to a nginx web server? tnks
> > >
> > > att: alex
> >
> > If with "block bots" you mean "block requests based on User Agent", you
> > can do this setting up something like this:
> >
> >                 if ($http_user_agent ~ libwww-perl ) {
> >                         return 400;
> >                 }
> >
> >
> > (just an example, of course)
> >
> >
> > --
> > Fabio "Cova" Coatti    http://members.ferrara.linux.it/cova
> > Ferrara Linux Users Group           http://ferrara.linux.it
> > GnuPG fp:9765 A5B6 6843 17BC A646  BE8C FA56 373A 5374 C703
> > Old SysOps never die... they simply forget their password.



-- 
Fabio "Cova" Coatti    http://members.ferrara.linux.it/cova     
Ferrara Linux Users Group           http://ferrara.linux.it
GnuPG fp:9765 A5B6 6843 17BC A646  BE8C FA56 373A 5374 C703
Old SysOps never die... they simply forget their password.





More information about the nginx mailing list