| > Here is my situation.
> We use (win)httrack to create static mirrors for
otherwise
> dynamic sites.
> One of our projects creates a very large site (around
15000-
> 2000 files, 95% originally dynamic html files).
> The complete htttrack mirror takes approx 12 hours.
> Of course we only create the mirror after we have
concluded
> the dynamic site is a 100% correct.
> Sometimes during the mirror the response contains a
> ColdFusion server error message, something like:
> Error processing request...
> Typically indicating server overload.
> Note, this is still a 200 response!
> It is not hard to find these cases in the mirrored site.
> (Simply do a string find in the mirrored tree)
> What I would like to do is, delete the mirrored files
with
> errors, than update the mirror, without processing all
> correct files again!
> Is it possible to achieve what I would like to do?> What other tweaks do
possibly need?
Wow.. quite hard to handle. A solution might be to patch
the hts-cache/new.ndx index file and invalid entries that
were incorrectly downloaded (by replacing link path by
XXXX, BUT ensure that you won't change the index size by
inserting characters), but this is quite tricky to do..
Another way (if the server overload if the reason of all
problems) would be to reduce the server overload, setting
maximum simultaneous connection to 2 or 3 in httrack.
| |