HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Download with Disallow in Robots.txt file
Author: Xavier Roche
Date: 07/17/2004 15:53
 
> I have troubles when I try to download the site
> <http://www.w3schools.com/>
> HTTrack reports the next error:

Yes: by default httrack respects robots.txt rules.

> How can I configure HTTrack in order to download this 
site?
Change the browser identity in the options (Set Options / 
Browser ID / Browser "identity") and disable robots.txt 
(Set Options / Spider / Spider), BUT also setup reasonnable 
download settings (bandwidth AND connections) : 

Set Options / Flow Control / Number of connections: 2
Set Options / Limits / Max transfer rate: 10000

I repeat: don't override default robots.txt and/or user-
agent settings without changing bandwidth and limits 
settings, or you will risk bandwidth abuse and server 
slowdown.
 
Reply Create subthread


All articles

Subject Author Date
Download with Disallow in Robots.txt file

07/12/2004 16:17
Re: Download with Disallow in Robots.txt file

07/17/2004 15:53
Re: Download with Disallow in Robots.txt file

10/30/2017 03:00
Re: Download with Disallow in Robots.txt file

08/16/2004 19:04
Re: Download with Disallow in Robots.txt file

01/06/2020 21:42
Re: Download with Disallow in Robots.txt file

01/06/2020 21:44
Re: Download with Disallow in Robots.txt file

01/06/2020 21:47




2

Created with FORUM 2.0.11