| > Here is challenge for any talented script writer amongst
> you. Sometimes when I am mirroring a site, say >300MB
> (although phonomena exists on smaller sizes too), I get a
> huge new.dat file of say 250MB in my hts-cache folder. If
> the mirror fails and I kick it off again, I get a similar
> large file and cannot actually get a clean mirror.
What do you mean? The "continue" feature will fetch data
from the cache. Yes, a new cache will be created ; but you
can erase the previous one then (old.* files)
> - Parse the new.dat file
> - Determine header/footer BOF/EOF markers for the
seperate files
You may try the --debug-cache feature
> - Extract individual files (.html, .zip, .gz etc) and
place
> them
> in the correct path within the existing partial mirror
> download
> - Discard and partially downloaded files which contain
> incomplete data
> - Delete new.dat (if you supplied the switch to do so)
> - Update any hts-cache control files such that if you
wished
> to contine
> to update the mirror, you would only download the files
> which do not
> exist or to suit the other parameters you give
WinHTTrack
> (or its variants).
>
> The above script could probably be written in any of a
> number of
> popular scripting language (php, perl, shell script,
> VBScript, Java etc.)
> If anyone has already written such a script I would be
> grateful for a
> copy, if not, this could be a project for someone who has
> skills in
> writing such scripts.
You want to complete the mirror using the cache? Then use
httrack for that (using --continue) ; it should work!
| |