| I'm trying to preserve the original URLs inside HTML docs (e.g. after the
HREF=, SRC=) as well as maintain the original file names of html documents
that these URLs point to. So hiding the query string doesn't do it.
I don't care about the path of the URLs, I can easily parse this into the
right absolute path whether on disk or online - but I can't return a filename
back to its original if a random number is placed in the filename instead of
the query string.
Since the HTTRack doesn't seem to be able to do this, I'm writing some code
that uses the <!-- Mirrored from header in each file to rename the files back
to the original, and also use the K4 option to keep the URLs in their
original. But I just realized that this may jeopardize HTTrack's ability to
do an update, if the update depends on filenames, which it likely does. | |