| > I run HTTrack on Linux 2.4.24; the web server is Apache
> 2.0.40 . Client machine is Windows XP home edition with
IE
> 6.0.
> Could you please give me some suggestions to use your
> HTTrack to setup the test bed?
First, ensure that you are using a recent HTTrack release
(such as the latest 3.32-3) ; as previous releases had
several parsing problems.
Second, remember that some sites might not be easily
downloadable, especially those who uses extensive
javascripting or flash - popular sites like yahoo may use
such tricks. Also, some other sites may filter the user-
agent, or use robots.txt rules. (Ensure that you don't use
too aggressive download settings when making test projects)
At last, ensure that the filtering rules are okay for the
sites you are downloading. For example, if you do:
httrack foo.yahoo.com/bar/
and if associated images are located in:
images.yahoo.com/
.. then the associated images will NOT be downloaded by
default, because the hosts are different (foo.yahoo.com and
images.yahoo.com)
It then might be necessary to use something like:
httrack
foo.yahoo.com/bar/ '+*.gif' '+*.jpg' '+*.png' '+*.css' '+*.j
s'
Similarly, if you also need to download bar.yahoo.com, use:
httrack
foo.yahoo.com/bar/ '+bar.yahoo.com/*' '+*.gif' '+*.jpg' '+*.
png' '+*.css' '+*.js'
Depending on the site(s) and associated files you want to
include, you'll have to define proper scan rules.
With these advices, you should be able to handle most
sites - but some of them might not be downloadable anyway.
| |