HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Questions to mirro site using HTTrack
Author: Xavier Roche
Date: 06/04/2004 12:29
 
> I run HTTrack on Linux 2.4.24; the web server is Apache 
> 2.0.40 . Client machine is Windows XP home edition with 
IE 
> 6.0.
> Could you please give me some suggestions to use your 
> HTTrack to setup the test bed?
First, ensure that you are using a recent HTTrack release 
(such as the latest 3.32-3) ; as previous releases had 
several parsing problems.

Second, remember that some sites might not be easily 
downloadable, especially those who uses extensive 
javascripting or flash - popular sites like yahoo may use 
such tricks. Also, some other sites may filter the user-
agent, or use robots.txt rules. (Ensure that you don't use 
too aggressive download settings when making test projects)

At last, ensure that the filtering rules are okay for the 
sites you are downloading. For example, if you do:
httrack foo.yahoo.com/bar/
and if associated images are located in:
images.yahoo.com/

.. then the associated images will NOT be downloaded by 
default, because the hosts are different (foo.yahoo.com and 
images.yahoo.com)

It then might be necessary to use something like:

httrack 
foo.yahoo.com/bar/ '+*.gif' '+*.jpg' '+*.png' '+*.css' '+*.j
s'

Similarly, if you also need to download bar.yahoo.com, use:

httrack 
foo.yahoo.com/bar/ '+bar.yahoo.com/*' '+*.gif' '+*.jpg' '+*.
png' '+*.css' '+*.js'

Depending on the site(s) and associated files you want to 
include, you'll have to define proper scan rules.

With these advices, you should be able to handle most 
sites - but some of them might not be downloadable anyway.

 
Reply Create subthread


All articles

Subject Author Date
Questions to mirro site using HTTrack

06/01/2004 20:02
Re: Questions to mirro site using HTTrack

06/04/2004 12:29




b

Created with FORUM 2.0.11