Httrack does not appear to obey robots.txt exclusion rules.
We've tried both WinHttrack and the command line httrack
with -sN2 option and in neither case does httrack obey the
robots.txt rules.
I found one posting to the forum which mentioned a hack that
allows httrack to behave like a browser and ignore too
restrictive robots exclusion rules.
Is there a way of getting httrack to obey robots exclusion
rules?
Version used 3.32.3 (unix) and 3.32.2 (win)
|