| Hi all
Using WinHTTrack v3.23 under W32 (have not tried v3.23 yet).
Here is challenge for any talented script writer amongst
you. Sometimes when I am mirroring a site, say >300MB
(although phonomena exists on smaller sizes too), I get a
huge new.dat file of say 250MB in my hts-cache folder. If
the mirror fails and I kick it off again, I get a similar
large file and cannot actually get a clean mirror. What
would be nice is to have a script which one can use offline.
The script would essentially do the following:
- Parse the new.dat file
- Determine header/footer BOF/EOF markers for the seperate files
- Extract individual files (.html, .zip, .gz etc) and place
them
in the correct path within the existing partial mirror
download
- Discard and partially downloaded files which contain
incomplete data
- Delete new.dat (if you supplied the switch to do so)
- Update any hts-cache control files such that if you wished
to contine
to update the mirror, you would only download the files
which do not
exist or to suit the other parameters you give WinHTTrack
(or its variants).
The above script could probably be written in any of a
number of
popular scripting language (php, perl, shell script,
VBScript, Java etc.)
If anyone has already written such a script I would be
grateful for a
copy, if not, this could be a project for someone who has
skills in
writing such scripts.
tia.
Osama | |