HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to get rid of anti-robots protection
Author: William Roeder
Date: 11/21/2008 16:52
 
> > Your log shows your mirroring blackviper.com not
> www.blackviper.com.
> Doesn't matter. I also tried with www, but with no
> luck.
It matters if the site contains absolute paths as httrack won't mirror
external sites by default.
 
> > You override this with options -> spider.
> Could you please tell me which exactly option
> overrides robots.txt protection?I assumed you were using winHttrack.
<http://httrack.com/html/fcguide.html> contains the entire httrack manual.

-sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always)
(--robots[=N])

-s0

> still gettings "403" error which says access to the
> sites is forbidden. 
> 
> If I can access the web site using my web browser,
> may be I should try to change httrack's browser ID
> to that one Konqueror uses?I tried the site, it rejects browser ID
containing httrack. Worked fine with:
-F "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" 
 
Reply Create subthread


All articles

Subject Author Date
How to get rid of anti-robots protection

11/19/2008 09:02
Re: How to get rid of anti-robots protection

11/19/2008 17:40
Re: How to get rid of anti-robots protection

11/21/2008 15:45
Re: How to get rid of anti-robots protection

11/21/2008 16:52
Re: How to get rid of anti-robots protection

02/17/2009 09:52
Re: How to get rid of anti-robots protection

05/02/2019 07:35




1

Created with FORUM 2.0.11