| > Why is it that the cache stores the full html source and
> not only header information for updates?> The answer is given in:
> Re:I can reproduce it
> from Xavier
Right - to make short: HTML files stored locally are
modified html forms, for example, links like:
<http://www.foo.fom/~smith/Bar.Html>
will be modified into:
_smith/bar.html
Note the change of the ~ character and the uppercase
characters modified. Such changes are necessary to comply
with local filesystem rules (creating, under windows, a
file which contains the ':' character, for example - or the
~ character with Unix systems, is impossible), but this is
erasing relevant information (the original URL)
That's why the engine has to store somewhere an "original"
html form, to be able to do updates.
Note that you can safely erase "old.*" files in the hts-
cache directory, this will save 30% space.. and you can
wipe the whole hts-cache directory, when burning a project
on a CD, or when distributing it.
> HTTTrack really is a useful tool!
Thanks :) Some remaining bugs, but I expect to wipe them
all soon
| |