| > > i was wondering if there will be a 'on-the-fly'
blocking
> > (when you are 'reaping' the homepage, you can decide
> > which part you do want and don't want).
>
> Well, this might be an idea ; maybe by adding/inserting
> filters (scan rules) on-the-fly, as you can add links on
> the fly during the mirror. But the use might be a bit
> difficult for many users, I'm afraid ..
I had a very similar idea, but with a graphical interface.
As the site is being retrieved, a tree view of the site
structure appears. Some items would be blank (awaiting to
be retrieved), some red (won't be retrieved) and the rest
green (has already been retrieved). The user might choose
to limit the display to domains, sub-domains, paths or HTML
files (so the display does not get flooded with a gazillion
of items representing GIFs, for example). Now, if the user
sees that HTTrack is heading off towards a directory
named "advertizing", he could change it to red with a mouse
click or two, effectively causing HTTrack to abondon
retrieval of that path and every link which emerges from
there. Similarily, if he sees that a part of the site
www.sitename.com is not mirrored because the robot does not
feel responsible for traversing into pr0n.sitename.com, he
could click the corresponding icon on the screen to green.
Now, it is of course a rather non-tivial problem to convert
clicks into scan rules and have the robot act accordingly
(if a node is clicked to green, and the robot has already
passed it on it's traversal, he still won't bother to
retrieve it). And circular references will add some more
spice, too.
I thought that it might help to do the mirroring in
discrete steps. Like, first pass mirrors down to a depth of
1 and then avaits the users decision on which branches to
follow or not. Next, mirroring goes down to a depth of 2,
and so on.
What I had in mind for the graphical representation was a
Hyperbolic Tree. But a few days I have learnt that this
thing is heavily patented (at least in the USA, don't know
about France...). An HT would be cool...but I'd rather
choose another kind of representation than face the risk of
dealing with lawyers instead of compilers. Compilers act
much more reasonable than lawyers, although this might
sound unlikely to a programmer which had not yet dealt with
lawyers... ;-)
- Klaus
| |