Stoping bots and wget in nginx
Fabio Coatti
cova at ferrara.linux.it
Wed Dec 19 11:00:32 MSK 2007
Alle martedì 18 dicembre 2007, Eden Li ha scritto:
> wget (and many other user agents) respect robots.txt if you place it
> at /robots.txt:
>
> http://www.robotstxt.org/orig.html
> http://en.wikipedia.org/wiki/Robots.txt
>
> Of course malicious agents will ignore it and continue scraping your
> site. It's pretty hard to block these kinds of bots since they can
> mimic browser requests that would be difficult to disambiguate from
> normal user requests.
That's true. But if you look carefully to a usual web site logs, most part of
weird urls are coming from a small subset of specific user agents (basically,
scripts run by people who barely have a clue of what they are doing).
While I agree that several tools respects robots.txt, they are the "good"
ones, and I see no point in stopping them. Othe other side, malicious tools
that fakes the user agent are really difficult to stop and you have to rely
on a good configuration of the system. In the middle lies a highly amount of
hits coming from specific user agents, mostly trying to do pretty harmless
things (bounce attacks, etc..). That kind of visitors can be kept out by a
simple configuration line, and given the hig rate of them it can be worth to
use that countermeasure (naive as it is)
>
> On 12/18/07, Fabio Coatti <cova at ferrara.linux.it> wrote:
> > Alle martedì 18 dicembre 2007, Alexis Torres Garnica ha scritto:
> > > Hi guys, I am new to the list. Is there a way to stop or block the bots
> > > access and wget to a nginx web server? tnks
> > >
> > > att: alex
> >
> > If with "block bots" you mean "block requests based on User Agent", you
> > can do this setting up something like this:
> >
> > if ($http_user_agent ~ libwww-perl ) {
> > return 400;
> > }
> >
> >
> > (just an example, of course)
> >
> >
> > --
> > Fabio "Cova" Coatti http://members.ferrara.linux.it/cova
> > Ferrara Linux Users Group http://ferrara.linux.it
> > GnuPG fp:9765 A5B6 6843 17BC A646 BE8C FA56 373A 5374 C703
> > Old SysOps never die... they simply forget their password.
--
Fabio "Cova" Coatti http://members.ferrara.linux.it/cova
Ferrara Linux Users Group http://ferrara.linux.it
GnuPG fp:9765 A5B6 6843 17BC A646 BE8C FA56 373A 5374 C703
Old SysOps never die... they simply forget their password.
More information about the nginx
mailing list