| I'm attempting to parse a site w/a few million pages
Right now my links scanned is
249/211615 (+117843)
Files written : 118117
Files updated : 118022
I had to restart once because i didn't make the max links value large enough.
Do these numbers make sense? I can't find any documentation about what they
even mean ..the a/b (+c) bit... on the forum i found something referencing the
a as 'links validated' or something?
Also, what does 'parsing htmlfile' refer too.. it is very slow and usually
starts around 50% .. many fiels are written/updated while it is going up to
100%.. the files i am parsking are not very big... so it shouldn't take too
long .. i read somewhere 10 meg html file should only take 3-4 sec.
Thanks for any insights | |