HTTrack Website Copier
Free software offline browser - FORUM
Subject: links scanned - parsing huge site
Author: marcus
Date: 04/03/2009 05:32
 
I'm attempting to parse a site w/a few million pages

Right now my links scanned is

249/211615 (+117843)

Files written : 118117
Files updated : 118022

I had to restart once because i didn't make the max links value large enough.


Do these numbers make sense? I can't find any documentation about what they
even mean ..the a/b (+c) bit... on the forum i found something referencing the
a as 'links validated' or something?
Also, what does 'parsing htmlfile' refer too.. it is very slow and usually
starts around 50% .. many fiels are written/updated while it is going up to
100%.. the files i am parsking are not very big... so it shouldn't take too
long .. i read somewhere 10 meg html file should only take 3-4 sec. 

Thanks for any insights
 
Reply


All articles

Subject Author Date
links scanned - parsing huge site

04/03/2009 05:32
Re: links scanned - parsing huge site

04/03/2009 16:07
Re: links scanned - parsing huge site

04/03/2009 21:20
Re: links scanned - parsing huge site

04/03/2009 22:00
Re: links scanned - parsing huge site

03/06/2021 10:36




5

Created with FORUM 2.0.11