HTTrack Website Copier
Free software offline browser - FORUM
Subject: Set HTTrack to just parse files, don't update
Author: Bogdan Popescu
Date: 11/08/2013 00:07
 
I'm trying to clone a huge site (MSDN). I gave httrack a list of links (192k
links to be exact) I want downloaded and apart from that I only made it
download CSS and other resources (these ones: +*.css +*.svg +*.ttf
+fonts.googleapis.com* +*.woff +*.eot +*.ico +*.png +*.jpg +*.gif +*.jpeg
+*.js).

It did a great job. It downloaded all the links and most of the resources, but
after downloading the last link it started parsing files which took a really
really long time (I think the task was too much for the VPS it was running
on), so I cancelled it (CTRL+C).

Now I'm left with everything nicely downloaded, the only problem is that the
files are not parsed (they have absolute links instead of the
offline-available relative links).

I tried running httrack --continue and also httrack --continue --updatehack,
but httrack now insists on redownloading everything again.

Is there any way I can convince httrack that I don't want anything
redownloaded and that I just want it to parse the HTML files and fix the links
for the files that are available offline?
 
Reply


All articles

Subject Author Date
Set HTTrack to just parse files, don't update 11/08/2013 00:07
Re: Set HTTrack to just parse files, don't update 11/09/2013 16:55
Re: Set HTTrack to just parse files, don't update 11/10/2013 05:51
Re: Set HTTrack to just parse files, don't update 11/10/2013 11:25
Re: Set HTTrack to just parse files, don't update 11/10/2013 14:32




9

Created with FORUM 2.0.11