HTTrack Website Copier
Free software offline browser - FORUM
Subject: Allows follow robots.txt doesn't quite
Author: Lars Clausen
Date: 02/25/2004 10:21
 
We noticed today that we'd downloaded a site despite it
having a robots.txt with

User-agent: *
Disallow: /

and us running HTTrack with -s2.  In the log, it says
'robots.txt rules are too restrictive, ignoring /'.  Now I
could understand if it did this with -s1, but -s2 claims to
always follow robots.txt.  The almost-rfc that defines
robots.txt (http://www.robotstxt.org/wc/norobots-rfc.html)
explicitly shows Disallow: /, and people are using it, so
why is -s2 ignoring it?
-Lars
 
Reply


All articles

Subject Author Date
Allows follow robots.txt doesn't quite

02/25/2004 10:21
Re: Allows follow robots.txt doesn't quite

02/25/2004 19:52




e

Created with FORUM 2.0.11