HTTrack Website Copier
Free software offline browser - FORUM
Subject: Script to extract files from hts-cache/new.dat
Author: Osama Zindabad
Date: 06/09/2003 02:54
Hi all

Using WinHTTrack v3.23 under W32 (have not tried v3.23 yet).  

Here is challenge for any talented script writer amongst
you.  Sometimes when I am mirroring a site, say >300MB
(although phonomena exists on smaller sizes too), I get a
huge new.dat file of say 250MB in my hts-cache folder.  If
the mirror fails and I kick it off again, I get a similar
large file and cannot actually get a clean mirror.  What
would be nice is to have a script which one can use offline.
 The script would essentially do the following:

- Parse the new.dat file
- Determine header/footer BOF/EOF markers for the seperate files
- Extract individual files (.html, .zip, .gz etc) and place
  in the correct path within the existing partial mirror
- Discard and partially downloaded files which contain
incomplete data
- Delete new.dat (if you supplied the switch to do so)
- Update any hts-cache control files such that if you wished
to contine
  to update the mirror, you would only download the files
which do not
  exist or to suit the other parameters you give WinHTTrack
(or its variants).

The above script could probably be written in any of a
number of 
popular scripting language (php, perl, shell script,
VBScript, Java etc.)
If anyone has already written such a script I would be
grateful for a 
copy, if not, this could be a project for someone who has
skills in 
writing such scripts.



All articles

Subject Author Date
Script to extract files from hts-cache/new.dat

06/09/2003 02:54
Re: Script to extract files from hts-cache/new.dat

06/09/2003 12:08
Re: Script to extract files from hts-cache/new.dat

12/07/2003 15:36


Created with FORUM 2.0.11