HTTrack Website Copier
Free software offline browser - FORUM
Subject: HTTRACK not useful for large mirroring projects?
Author: ennogrue
Date: 07/30/2006 15:16
 
Hello!
 
I worked with HTTRACK some years ago and now installed the new version 3.40-2
of WinHTrack on two computers: 
1. Pentium 3, 1GHz, 1GByte RAM, SDSL 2000/2000kbit/s via FLI4L-router,
WindowsXPSP2
2. Athlon 2 GHz, 500MByte RAM, SDSL 2000/2000kbit/s via FLI4L-router,
WindowsXPSP2
 
I tried to mirror a larger site (with a simple HTML tree structure and JPGs
and ZIPs), but like with the older version some years ago I experienced a
behaviour, which is not properly reflected in your FAQ:
 
I set up the download with 3 connections, a bandwidth of 100kbit/s limit, and
several rules (about 20 rules only to mirror specific pathes and smaller sizes
of files). In the beginning the performance of HTTRACK ist excellent, the
bandwidth is nearly fully used, all 3 connections are used in parallel and the
overall parsing and downloading ist very fast.
 
But if we come to a "border" of 60000..80000 detected links after several
hours of runtime, HTTRACK slows down dramatically: It never stops or hangs,
and downloads of files are fast all the time, if they occur, but most of the
time is spent with very slow parsing and scanning. The analysing of a small
directory with one index file containing a few links needs up to several
minutes! Abd then HTTrack just downsloads one or two files, and again
"thinks". Anyway the HTTRACK system workload of the computer/Windows XP ist
about 90%. It does not seem to be a problem of network bandwidth: in fact I
see that during scanning and parsing most of the time there is no HTTRACK
related network traffic at all! HTTRACK seems to be "internally thinking".
(with an internet explorer I can reach the site without problems at that
time). 
 
On demand I can send you 4 screenshots: 
slowparsing.jpg:   the "slow behavior" on the athlon computer. Just 3515 /
78716 done, transfer rate: 0.
slowparsing1.jpg:   athlon system workload: 90%
slowparsing2.jpg:   athlon system ressources (1/2 of RAM, 1,1GB Swap file)
slowparsing1a.jpg:   the same behaviour on the Pentium 3 during an Update
process (after interrupting the original download)
 
Can anyone tell me what is happening here? As I understand HTTRACK has built
up a large internal database with all the links, but at the moment when it
slows down is has only treeted a small part of all detectet links - so most of
the work is to be done - and anyway HTTRACK is so slow that you can guess it
will "never" finish the job. Is there a memory leak or something like this? 
Do you know any workaround to bypass this problem? 
Best regards,
ennogrue
 
Reply


All articles

Subject Author Date
HTTRACK not useful for large mirroring projects?

07/30/2006 15:16
Re: HTTRACK not useful for large mirroring project

07/30/2006 18:38




b

Created with FORUM 2.0.11