| Hi,
Here is my situation.
We use (win)httrack to create static mirrors for otherwise
dynamic sites.
One of our projects creates a very large site (around 15000-
2000 files, 95% originally dynamic html files).
The complete htttrack mirror takes approx 12 hours.
Of course we only create the mirror after we have concluded
the dynamic site is a 100% correct.
Sometimes during the mirror the response contains a
ColdFusion server error message, something like:
Error processing request...
Typically indicating server overload.
Note, this is still a 200 response!
It is not hard to find these cases in the mirrored site.
(Simply do a string find in the mirrored tree)
What I would like to do is, delete the mirrored files with
errors, than update the mirror, without processing all
correct files again! (I have control over the dynamic site
and its database, so I can say for sure that the dynamic
site has not changed.) So I want to go a lot further than
the 'update hack'.
I was thinking of feeding the original dynamic urls leading
to the corrupted (locally deleted) files to httrack and
than update the mirror.
- How can I deduce the originating urls from the corrupted
mirrored files?- I am changing project settings, if my starting url(s)
change, will HTTrack still use the same logs and caches?- How do I tell
HTTrack to go upwards in my site as well.
- Most important: how do I tell HTTrack to ignore files
already present on my locally mirrored site.
Is it possible to achieve what I would like to do?What other tweaks do
possibly need?
Thanks in advance,
Remke Rutgers
| |