| > I'm attempting to parse a site w/a few million
> pages
I hope you have a few 100 GB of free disk space just for the html.
> Do these numbers make sense? I can't find any
> documentation about what they even mean ..the a/b
> (+c) bit... on the forum i found something
> referencing the a as 'links validated' or
> something?
FAQ: What is the meaning of the Links scanned: 12/34 (+5) line in
WinHTTrack/WebHTTrack? - <http://www.httrack.com/html/faq.html#QM10b>
> so it shouldn't take too long .. i read somewhere 10
> meg html file should only take 3-4 sec.
While it is parsing the file, it is also updating other html. You're seeing
the get head round trip time to the server. An 80,000 mirror takes me 2 hours
to update even if nothing has changed.
| |