Subject: a few ideas
Author: TerraFrost
Date: 05/27/2003 08:55
i would imagine that the same code that HTTrack uses to 
save websites could also be used as a link checker - to 
check that all the links on a website work, up to some 
depth...  so that's my first suggestion - a link checker be 
incorporated into HTTrack, sorta.

my second suggestion is that perhapes HTTrack could 
perhapes rename files based on the documents title, or a 
portion of the title, or some part of text on the homepage 
itself.  the changed name could be saved in some sort of 
table, and then, whenever the original name is found in an 
html file, it's changed to the new name - the name in the 
table.  this would probably require a second pass, though - 
a pass after all the files had been saved, and the table 
had been generated.

my third suggestion is that perhapes HTTrack could perform 
alterations on the html as it is saving it.  for 
example...  say some homepage contains code to display an 
ad.  there would be a table, an img command, and perhapes 
some simple javascript.  when archiving a page that's 
hosted on, say, geocities, you'll be getting ads that 
aren't part of the original code.  while i don't think 
HTTrack could figure out which parts of the code to remove, 
the user could.  the user could figure it out by looking at 
one or maybe two pages, and then paste that into some text 
area within some window of HTTrack, and then, as HTTrack is 
downloading each file, it would delete that portion of the 
file.  or rather, it would delete that portion of the file 
as the download of that file was complete.

also, internet explorer can save homepages in a "web 
archive" mht format...  this format saves an ind. page and 
all the images on that page into one file that is viewable 
by ie (maybe by other browsers, too...  i dunno).  if this 
is an open format, perhapes it could be incorporated into 
HTThreads as a feature that can be enabled, but that is 
disabled by default?
finally, usenet messages bundle images / attachments within 
them with unecode...  would it be possible to do this with 
html pages, as well?  i would think it would be easy enough 
to try, but...  i never have, heh, and am too lazy too :)



