HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: access blocked
Author: Xavier Roche
Date: 05/27/2002 22:11
 
> Does HTTrack use the referrer?
Yes

> Is it possible to conigure an automatic wait period 
between 
> requests?
Yes - you can select 1 connection per second, but also 
limit the number of simultaneous connection to 1 or 2.

> How can I enable HTTrack to mirror this web-site?
You may also limit the bandwidth to something like 8KB/s ; 
the bandwidth limiter in httrack is now very sharp and 
allow you to limit bandwidth abuse

> The above meassures should shield SL from the most 
> offensive scripts. What if you would still like to 
> mirror/download SL? Use a friendly script such as wget 
> which obeys robots.txt.

Therefore, if you leave all httrack options as is (follow 
robots.txt), and use bandwith limiter (1 conn/second, 1 
simultaneous connection, +bw limit), this should be okay.

> If you use wget don't forget to specify a 
> wait period between the requests (at least '-w 3'). Yes,

Err, 3 seconds? I'll have to implement a larger delay in 
httrack (which is limited to 1 second) in the future - but 
using slower bandwidth limit should be okay (maybe 3 or 4KB)

Also, please cut/paste this filter into the 'Scan rules' 
options of httrack (Options/Scan rules) :

-*/*?edit=* -*/*?copy=* -*/*?diff=* -*/*?header=* -*/*?info=* -*/*?search=*
-*/*?blockme=* -*/*?random=* -*/*?edit=*

as the current (basic) handling of robots.txt does not 
understand the format of this site (/?foo..) (added on the 
todo list..)
 
Reply Create subthread


All articles

Subject Author Date
access blocked

05/27/2002 17:10
Re: access blocked

05/27/2002 22:11
Re: access blocked

05/28/2002 10:07




0

Created with FORUM 2.0.11