HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Download with Disallow in Robots.txt file
Author: Xavier Roche
Date: 07/17/2004 15:53
 
> I have troubles when I try to download the site
> <http://www.w3schools.com/>
> HTTrack reports the next error:

Yes: by default httrack respects robots.txt rules.

> How can I configure HTTrack in order to download this 
site?
Change the browser identity in the options (Set Options / 
Browser ID / Browser "identity") and disable robots.txt 
(Set Options / Spider / Spider), BUT also setup reasonnable 
download settings (bandwidth AND connections) : 

Set Options / Flow Control / Number of connections: 2
Set Options / Limits / Max transfer rate: 10000

I repeat: don't override default robots.txt and/or user-
agent settings without changing bandwidth and limits 
settings, or you will risk bandwidth abuse and server 
slowdown.
 
Reply


All articles

Subject Author Date
Re: Download with Disallow in Robots.txt file

07/17/2004 15:53
Re: Download with Disallow in Robots.txt file

10/30/2017 03:00




d

Created with FORUM 2.0.11