HTTrack Website Copier
Free software offline browser - FORUM
Subject: How to get rid of anti-robots protection
Author: Elias
Date: 11/19/2008 09:02
 
As far as I know httrack searches for index.html (or any other index file) on
the entire website, and reaches all other files mentioned in there, copying it
all. Do I understand it correct?
I'm trying to mirror website <http://www.blackviper.com>, which is accessible
in any web browser I use (like Konqueror, Lynx, Firefox, and any other", while
mirroring this website using httrack I get an error "403: Forbidden". Then I
checked hts-log.txt file in the project directory and I found the reason of
this. Looks like this website somehow "knows" that not a human asscessing its
pages. Somebody got a clue how to get rid of this? 

Can't httrack interact ain't like robot with an anti-robots protected
websites?
10:37:22        Warning:        Redirected link is identical because of 'URL
Hack' option: blackviper.com/robots.txt and www.blackviper.com/robots.txt
10:37:22        Warning:        File has moved from blackviper.com/robots.txt
to <http://www.blackviper.com/robots.txt>
10:37:23        Warning:        Redirected link is identical because of 'URL
Hack' option: blackviper.com/ and www.blackviper.com/
10:37:23        Warning:        File has moved from blackviper.com/ to
<http://www.blackviper.com/>
10:37:23        Info:   No data seems to have been transfered during this
session! : restoring previous one!
 
Reply


All articles

Subject Author Date
How to get rid of anti-robots protection

11/19/2008 09:02
Re: How to get rid of anti-robots protection

11/19/2008 17:40
Re: How to get rid of anti-robots protection

11/21/2008 15:45
Re: How to get rid of anti-robots protection

11/21/2008 16:52
Re: How to get rid of anti-robots protection

02/17/2009 09:52
Re: How to get rid of anti-robots protection

05/02/2019 07:35




8

Created with FORUM 2.0.11