HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Set HTTrack to just parse files, don't update
Author: Bogdan Popescu
Date: 11/10/2013 05:51
 
> Humm, this is strange - the html files you see are
> final ones - httrack do not update them ; could it
> be some kind of javascript issue ?
Not likely. Some of the pages have relative, working links. These are the
pages that httrack had a change to go through.

> Was the cache damaged due to the ^C interrupt maybe
> ? (ie. corrupted ZIP file ?) In this case, httrack
> won't be able to quickly re-parse the files, because
> the cache is the only way files can be rechecked
> (ie. html files on disk are never used for update -
> only data files might be used as they are left
> untouched)

That's probably the case. I'm not able to unarchive the ZIP file manually due
to corruption.

This is how it went:
1. httrack started with the list of 192k links and showed "Links scanned:
0/0"
2. The current job was "parsing HTML file (0%)". The % increased slowly as
httrack went through the large list of links
2. I left it and it kept downloading until it reached "Links scanned: 
0/192000"
3. After that, it started going through the downloaded files and parsing them.
This process was very very slow (about 5x times slower than the initial
download). This should have been a lot faster, as it didn't have to download
anything, just parse the links in the page and make them relative...
4. I used CTRL+C and httrack said "Finishing pending transfers... press again
^C to quit." and I left it to complete, which took a while

Any idea how the ZIP file got corrupted, as I have left httrack to gracefully
quit?
Also, any idea why it takes so much time for httrack to go through the pages
and make the links relative? Any setting I could use to speed parsing up?
 
Reply Create subthread


All articles

Subject Author Date
Set HTTrack to just parse files, don't update

11/08/2013 00:07
Re: Set HTTrack to just parse files, don't update

11/09/2013 16:55
Re: Set HTTrack to just parse files, don't update

11/10/2013 05:51
Re: Set HTTrack to just parse files, don't update

11/10/2013 11:25
Re: Set HTTrack to just parse files, don't update

11/10/2013 14:32




1

Created with FORUM 2.0.11