| Hello,
I've come up with an idea for managing the downloading
of a website. I haven't seen, nor do I know of, any
programs that implement this. (Does anyone else?)
Anyway the idea is the offline browser program (like
HTTrack) downloads a page (say cnn.com's home page),
then it stops and waits for you to select which links
of that page to continue downloading (say the Space
and World news sections). Then it'd download those
pages (Space and World, and present the next sublevel
of links.)
This would all be presented in a Explorer tree-like
manner; pretend <http://cnn.com/> is the root of the
tree (like C:\ or My Computer) with each folder being
a link from the root. Expanding a folder would reveal
subfolders of additional links. Now, to choose to
include a folder and its subfolders, you'd put a
checkbox next to any given folder...i.e. if you put a
checkbox next to the root, you'd download the whole
website (subfolders inherit the checkboxes), if you
put it next to http:/cnn.com/space, you'd get the root
home page, and the Space section and its subsections.
(The idea being you get the specific section you are
interested in, and the pages below it in the path back
to the root.)
If any of you have seen how you select what folders to
share in programs like Napster, Limewire, etc. you'll
see where I get the checkboxes-next to an Explorer-
tree idea. Now just to replace those explorer folders
with URLs.
Implementation-wise comments (my opinions):
-default should be to download nothing, except that
which is checkboxed and what is underneath the
checkboxed levels.
-default should start at the root of a website
(probably <http://abc.com/>, but maybe not...)
-default should start with the root collapsed, and
only sections specifically clicked on are expanded to
see their subfolders (sublinks). This is probably
important, because you can only see the sublinks by
downloading the HTML files containing those links, and
if you're doing that you've already started mirroring
the site. Thus only download HTML files needed for
explicitly expanded folders and subfolders.
-this idea can probably be built on top of the HTTrack
filters relatively easily, requiring at most an
interface change or add-on. Its conceivable to me
that one could create a helper program to HTTrack that
creates the proper filters to use based on what
branches of the tree are checkboxed, and then the user
copies/pastes those filters into HTTrack. That's the
main problem with using HTTrack as-is to do this:
setting up the needed filters to do what I'm
suggesting is hard to do manually, yet automating it
would enhance the power of HTTrack a lot. To be the
most site and net-friendly I guess you'd exclude
everything in general, and then start including
specific branches as checkboxed.
Last idea (came to me after I typed the above, and
hasn't been thought thru much yet): it might be better
to give three options instead of a simple checkbox
(Include, Neutral, Exclude) or some variation thereof
(ex. Allow, Neutral, Deny). That's how security is
handled in NT with permissions being Neutral (and
inherited) unless explicitly Allowed. The Deny
overrules Allow and Neutral.
If any of you read this over, I'd love to hear your
comments. Sorry it's so long but I wanted to make it
as clear as I could for anyone who knows how or is
interested in implementing it :)
-Haudy Kazemi | |