| > i would imagine that the same code that HTTrack uses to
> save websites could also be used as a link checker - to
> check that all the links on a website work, up to some
> depth... so that's my first suggestion - a link checker
be
> incorporated into HTTrack, sorta.
Err, the "test all links" option should do the trick?
Note: You can not test links from a site without
downloading pages that refers to these links
> my second suggestion is that perhapes HTTrack could
> perhapes rename files based on the documents title, or a
> portion of the title
Humm, this isn't possible, unfortunately, because httrack
names all files BEFORE downloading them. This is a problem
also when redirections are encountered ; that's right, but
it was a design choice (the other method ; waiting for all
links before generating the pages, has many other problems,
such as stack management problems (memory) and reget issues)
> my third suggestion is that perhapes HTTrack could
perform
> alterations on the html as it is saving it. for
> example... say some homepage contains code to display an
> ad. there would be a table, an img command, and perhapes
> some simple javascript. when archiving a page that's
> hosted on, say, geocities, you'll be getting ads that
> aren't part of the original code. while i don't think
> HTTrack could figure out which parts of the code to
remove,
> the user could. the user could figure it out by looking
at
> one or maybe two pages, and then paste that into some
text
> area within some window of HTTrack, and then, as HTTrack
is
> downloading each file, it would delete that portion of
the
> file.
Humm. Such post-processing is rather complex ; and even if
it can be done using some C callbacks (see the --callback
wrappers in newest releases) I doubt that most users will
be able to use it :(
> also, internet explorer can save homepages in a 'web
> archive' mht format... this format saves an ind. page
and
> all the images on that page into one file that is
viewable
> by ie (maybe by other browsers, too... i dunno). if
this
> is an open format, perhapes it could be incorporated into
> HTThreads as a feature that can be enabled, but that is
> disabled by default?
This is something like an open format: mht archives are in
fact.. eml files that are in fat.. MIME messages! These
files are compatible with all environments, including Unix
(this is the standard mailbox format)
I could add this feature in the future ; but I don't know
how many users would use it
> finally, usenet messages bundle images / attachments
within
> them with unecode... would it be possible to do this
with
> html pages, as well? i would think it would be easy
enough
> to try, but... i never have, heh, and am too lazy too :)
"mht" files would perfetly fit this usage. I also suspect
that a complete website could be embedded in a single mht
file, including links inside. (you could "browse" the site
from a single file)
It would produce a huge file, however (all data +>33% due
to base64 encoding)
| |