HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Scraping forum
Author: Xavier Roche
Date: 07/21/2003 20:11
First ask the webmaster if you can copy the forum - then, 
use reasonnable settings to mirror it (this will take more 
time, but it is better to let a mirror run several hours 
with reasonnable connection settings than clobbering the 
server bandwidth and being reported to your ISP as an 
abuser) :

Set options / Limits
Max transfer rate: 10000 B/s

Set options / Flow Contol
Number of connections: 1

Set options / Spider
Spider: No robots.txt rules

Set Options / Scan rules*

AND MONITOR THE MIRROR to ensure that you don't go in 
infinite loops or other similar problems.

Reply Create subthread

All articles

Subject Author Date
Scraping forum

07/21/2003 18:32
Re: Scraping forum

07/21/2003 20:11


Created with FORUM 2.0.11