Stoping bots and wget in nginx

Eden Li eden at mojiti.com
Tue Dec 18 23:09:59 MSK 2007


wget (and many other user agents) respect robots.txt if you place it
at /robots.txt:

  http://www.robotstxt.org/orig.html
  http://en.wikipedia.org/wiki/Robots.txt

Of course malicious agents will ignore it and continue scraping your
site.  It's pretty hard to block these kinds of bots since they can
mimic browser requests that would be difficult to disambiguate from
normal user requests.

On 12/18/07, Fabio Coatti <cova at ferrara.linux.it> wrote:
> Alle martedì 18 dicembre 2007, Alexis Torres Garnica ha scritto:
> > Hi guys, I am new to the list. Is there a way to stop or block the bots
> > access and wget to a nginx web server? tnks
> >
> > att: alex
>
> If with "block bots" you mean "block requests based on User Agent", you can do
> this setting up something like this:
>
>                 if ($http_user_agent ~ libwww-perl ) {
>                         return 400;
>                 }
>
>
> (just an example, of course)
>
>
> --
> Fabio "Cova" Coatti    http://members.ferrara.linux.it/cova
> Ferrara Linux Users Group           http://ferrara.linux.it
> GnuPG fp:9765 A5B6 6843 17BC A646  BE8C FA56 373A 5374 C703
> Old SysOps never die... they simply forget their password.
>
>





More information about the nginx mailing list