| As far as I know httrack searches for index.html (or any other index file) on
the entire website, and reaches all other files mentioned in there, copying it
all. Do I understand it correct?
I'm trying to mirror website <http://www.blackviper.com>, which is accessible
in any web browser I use (like Konqueror, Lynx, Firefox, and any other", while
mirroring this website using httrack I get an error "403: Forbidden". Then I
checked hts-log.txt file in the project directory and I found the reason of
this. Looks like this website somehow "knows" that not a human asscessing its
pages. Somebody got a clue how to get rid of this?
Can't httrack interact ain't like robot with an anti-robots protected
websites?
10:37:22 Warning: Redirected link is identical because of 'URL
Hack' option: blackviper.com/robots.txt and www.blackviper.com/robots.txt
10:37:22 Warning: File has moved from blackviper.com/robots.txt
to <http://www.blackviper.com/robots.txt>
10:37:23 Warning: Redirected link is identical because of 'URL
Hack' option: blackviper.com/ and www.blackviper.com/
10:37:23 Warning: File has moved from blackviper.com/ to
<http://www.blackviper.com/>
10:37:23 Info: No data seems to have been transfered during this
session! : restoring previous one!
| |