HTTrack Website Copier
Free software offline browser - FORUM
Subject: Parse speed / download efficiency
Author: Jim
Date: 06/17/2003 16:58
Hi Xavier,

I'm mirroring quite a large site (lots of asp's with parameters to pull
content from a db) - which results in a good 30 to 40k files to grab in total. 
Problem I have is performance.  

The "parsing" and validation of links seems to take too long, and hence
bandwidth is wasted because of a blocking wait on parse.  I only seem to get 1
parse occuring at a time dispite having 3 or more threads, other downloads do
not happen while httrack is waiting for the parse to finish.  

Why have parse phases and download phases together?  They should be in
seperate allocation pools, so you can set parse (cpu) to the amount of cpu's
available, and downloads to the amount of bandwidth.

Why validate links while parsing on new downloads anyhow?  Seems a waste to
validate that a file is available, and put it 20000 files down the queue which
may lead to problems with cookies expiring etc.  Performance would also be
improved because you wouldnt have 20000 head requests for no reason.

I seem to recall that some servers support serverside compression of html on
the fly, which again, speeds up performance significantly.

Of course I may be totally wrong on some or all of the above :)



All articles

Subject Author Date
Parse speed / download efficiency

06/17/2003 16:58
Re: Parse speed / download efficiency

06/17/2003 18:52


Created with FORUM 2.0.11