HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Spider identification in robot.txt
Author: William Roeder
Date: 05/21/2009 15:30
 
> As someone who has just had their entire website
> stolen by somebody using HTTrack, using in excess of
> 50MB of my bandwidth, I was a little disappointed to
> see no mention on the HTTrack website on how I can
> ban the HTTrack spider from my site using
> robots.txt. 

You can't. Httrack specifically has an option to override robots.txt

> If you create tools like these you have a
> responsibility to allow webmasters to control access
> to their site. I do appreciate that you are not
> responsible for the errant behaviour of your users.
> You are responsible for implementing a spider that
> follows accepted spider etiquette and to document
> said behaviour.

This was just talked about in the forum FIVE days ago:
<http://forum.httrack.com/readmsg/21031/21021/index.html>
<http://forum.httrack.com/readmsg/21074/21021/index.html>
<http://www.httrack.com/html/abuse.html#WEBMASTERS>
 
Reply Create subthread


All articles

Subject Author Date
Spider identification in robot.txt

05/21/2009 11:09
Re: Spider identification in robot.txt

05/21/2009 15:30
Re: Spider identification in robot.txt

05/21/2009 18:14
Re: Spider identification in robot.txt

05/21/2009 19:08
Re: Spider identification in robot.txt

05/04/2011 11:04




8

Created with FORUM 2.0.11