I often found ???html.html
e.g. if there's page abc.html
Sometimes httrack grab it as abchtml.html
Some next times if I update it. It may be purge and be abc.html instead. Or if
I use 'Do not purge old file' option I may end up have duplicate file of
them.
It happens randomly.
I prefer 'Do not purge old file' because sometimes httrack is not grab all
file and purge some of them when I update - which in fact it should not purge
because those files are still valid and link with saved page. I know they're
because I update it after first download immediately in static blog (No update
in long time). I'm also check purged file and the page that it's in. It's
still there. So I usual 'Download' and 'Update' a couple of times with 'No
purge' option to make sure that all file are downloaded.
So the problem is I receive some duplicate file at random by the ???html.html
case. Its content may difference e.g. A link in abchtml.html file link to
def.php while the same link in abc.html file link to def.htm (It from same
source page anyway)
Is there the option to purge only *html.html and *.tmp (httrack temp file)
type. I cant just search for *html.html and delete all of them because in some
case there's abchtml.html but there's no abc.html yet. So I prefer probability
to have 2 of them and have none of them. But it's better to have to duplicate
file.
*Note that it's not cgi or php generated's file case where it's hard to avoid
confusion. It just that some normal abc.html |