HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: links scanned - parsing huge site
Author: William Roeder
Date: 04/03/2009 16:07
 
> I'm attempting to parse a site w/a few million
> pages

I hope you have a few 100 GB of free disk space just for the html.

> Do these numbers make sense? I can't find any
> documentation about what they even mean ..the a/b
> (+c) bit... on the forum i found something
> referencing the a as 'links validated' or
> something?
FAQ: What is the meaning of the Links scanned: 12/34 (+5) line in
WinHTTrack/WebHTTrack? - <http://www.httrack.com/html/faq.html#QM10b>

> so it shouldn't take too long .. i read somewhere 10
> meg html file should only take 3-4 sec. 

While it is parsing the file, it is also updating other html.  You're seeing
the get head round trip time to the server. An 80,000 mirror takes me 2 hours
to update even if nothing has changed.
 
Reply Create subthread


All articles

Subject Author Date
links scanned - parsing huge site

04/03/2009 05:32
Re: links scanned - parsing huge site

04/03/2009 16:07
Re: links scanned - parsing huge site

04/03/2009 21:20
Re: links scanned - parsing huge site

04/03/2009 22:00




d

Created with FORUM 2.0.11