Re: Script to extract files from hts-cache/new.dat

Subject: Re: Script to extract files from hts-cache/new.dat

Author: Xavier Roche

Date: 06/09/2003 12:08

> Here is challenge for any talented script writer amongst
> you.  Sometimes when I am mirroring a site, say >300MB
> (although phonomena exists on smaller sizes too), I get a
> huge new.dat file of say 250MB in my hts-cache folder.  If
> the mirror fails and I kick it off again, I get a similar
> large file and cannot actually get a clean mirror.

What do you mean? The "continue" feature will fetch data 
from the cache. Yes, a new cache will be created ; but you 
can erase the previous one then (old.* files)

> - Parse the new.dat file
> - Determine header/footer BOF/EOF markers for the 
seperate files

You may try the --debug-cache feature

> - Extract individual files (.html, .zip, .gz etc) and 
place
> them 
>   in the correct path within the existing partial mirror
> download
> - Discard and partially downloaded files which contain
> incomplete data
> - Delete new.dat (if you supplied the switch to do so)
> - Update any hts-cache control files such that if you 
wished
> to contine
>   to update the mirror, you would only download the files
> which do not
>   exist or to suit the other parameters you give 
WinHTTrack
> (or its variants).
> 
> The above script could probably be written in any of a
> number of 
> popular scripting language (php, perl, shell script,
> VBScript, Java etc.)
> If anyone has already written such a script I would be
> grateful for a 
> copy, if not, this could be a project for someone who has
> skills in 
> writing such scripts.

You want to complete the mirror using the cache? Then use 
httrack for that (using --continue) ; it should work!

Create subthread

All articles

Subject	Author	Date
Script to extract files from hts-cache/new.dat		06/09/2003 02:54
Re: Script to extract files from hts-cache/new.dat		06/09/2003 12:08
Re: Script to extract files from hts-cache/new.dat		12/07/2003 15:36