| We noticed today that we'd downloaded a site despite it
having a robots.txt with
User-agent: *
Disallow: /
and us running HTTrack with -s2. In the log, it says
'robots.txt rules are too restrictive, ignoring /'. Now I
could understand if it did this with -s1, but -s2 claims to
always follow robots.txt. The almost-rfc that defines
robots.txt (http://www.robotstxt.org/wc/norobots-rfc.html)
explicitly shows Disallow: /, and people are using it, so
why is -s2 ignoring it?
-Lars
| |