| I just found this; httrack users who ignore robots.txt may
want to take close notice and change copying strategy.
<http://www.robotcop.org/>
"Robotcop is an open source module for webservers which
helps webmasters prevent spiders from accessing parts of
their sites they have marked off limits."
<http://www.searchtools.com/>
Robotcop enforces robots.txt
"The Robots.txt file is a cooperative way to request that
crawlers and spiders avoid certain parts of web sites. This
free server module watches for spiders which read pages
disallowed in robots.txt, and blocks all further requests
from that IP address. It is particularly useful for
blocking email address harvesters, while still allowing
legitimate search engine spiders. Be sure to double-check
your robots.txt file (use one or more of the robots.txt
checkers), before implementing it, and to watch your server
logs carefully. The August 2002 version (0.6) works with
Apache 1.3 on FreeBSD and Linux." | |