HTTrack Website Copier
Free software offline browser - FORUM
Subject: Only download files not existing locally
Author: Remke Rutgers
Date: 09/30/2002 14:28

Here is my situation.
We use (win)httrack to create static mirrors for otherwise 
dynamic sites.
One of our projects creates a very large site (around 15000-
2000 files, 95% originally dynamic html files).
The complete htttrack mirror takes approx 12 hours.
Of course we only create the mirror after we have concluded 
the dynamic site is a 100% correct.
Sometimes during the mirror the response contains a 
ColdFusion server error message, something like: 
Error processing request... 
Typically indicating server overload.
Note, this is still a 200 response!
It is not hard to find these cases in the mirrored site. 
(Simply do a string find in the mirrored tree)
What I would like to do is, delete the mirrored files with 
errors, than update the mirror, without processing all 
correct files again! (I have control over the dynamic site 
and its database, so I can say for sure that the dynamic 
site has not changed.) So I want to go a lot further than 
the 'update hack'.
I was thinking of feeding the original dynamic urls leading 
to the corrupted (locally deleted) files to httrack and 
than update the mirror.
- How can I deduce the originating urls from the corrupted 
mirrored files?- I am changing project settings, if my starting url(s) 
change, will HTTrack still use the same logs and caches?- How do I tell
HTTrack to go upwards in my site as well.
- Most important: how do I tell HTTrack to ignore files 
already present on my locally mirrored site.

Is it possible to achieve what I would like to do?What other tweaks do
possibly need?
Thanks in advance,

Remke Rutgers

All articles

Subject Author Date
Only download files not existing locally

09/30/2002 14:28
Re: Only download files not existing locally

10/01/2002 19:15


Created with FORUM 2.0.11