HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to get rid of anti-robots protection
Author: William Roeder
Date: 11/19/2008 17:40
 
> As far as I know httrack searches for index.html (or
> any other index file) on the entire website, and
> reaches all other files mentioned in there, copying
> it all. Do I understand it correct?Yes
 
> I'm trying to mirror website
> <http://www.blackviper.com>, 
Your log shows your mirroring blackviper.com not www.blackviper.com.

> Looks like this website somehow "knows" that not a
> human asscessing its pages. Somebody got a clue how
> to get rid of this? 
Since httrack isn't a human, it follows robots.txt so it doesn't abuse the
site <http://httrack.com/html/faq.html#Q0>
You override this with options -> spider.
Sites can also know that it's httrack by the browser ID but most don't care.
 
Reply Create subthread


All articles

Subject Author Date
How to get rid of anti-robots protection

11/19/2008 09:02
Re: How to get rid of anti-robots protection

11/19/2008 17:40
Re: How to get rid of anti-robots protection

11/21/2008 15:45
Re: How to get rid of anti-robots protection

11/21/2008 16:52
Re: How to get rid of anti-robots protection

02/17/2009 09:52
Re: How to get rid of anti-robots protection

05/02/2019 07:35




3

Created with FORUM 2.0.11