HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: A feature request on 'on-the-fly' blocking
Author: Klaus
Date: 11/19/2003 19:14
> > i was wondering if there will be a 'on-the-fly' 
> > (when you are 'reaping' the homepage, you can decide 
> > which part you do want and don't want).
> Well, this might be an idea ; maybe by adding/inserting 
> filters (scan rules) on-the-fly, as you can add links on 
> the fly during the mirror. But the use might be a bit 
> difficult for many users, I'm afraid ..

I had a very similar idea, but with a graphical interface. 
As the site is being retrieved, a tree view of the site 
structure appears. Some items would be blank (awaiting to 
be retrieved), some red (won't be retrieved) and the rest 
green (has already been retrieved). The user might choose 
to limit the display to domains, sub-domains, paths or HTML 
files (so the display does not get flooded with a gazillion 
of items representing GIFs, for example). Now, if the user 
sees that HTTrack is heading off towards a directory 
named "advertizing", he could change it to red with a mouse 
click or two, effectively causing HTTrack to abondon 
retrieval of that path and every link which emerges from 
there. Similarily, if he sees that a part of the site is not mirrored because the robot does not 
feel responsible for traversing into, he 
could click the corresponding icon on the screen to green.

Now, it is of course a rather non-tivial problem to convert 
clicks into scan rules and have the robot act accordingly 
(if a node is clicked to green, and the robot has already 
passed it on it's traversal, he still won't bother to 
retrieve it). And circular references will add some more 
spice, too.

I thought that it might help to do the mirroring in 
discrete steps. Like, first pass mirrors down to a depth of 
1 and then avaits the users decision on which branches to 
follow or not. Next, mirroring goes down to a depth of 2, 
and so on.

What I had in mind for the graphical representation was a 
Hyperbolic Tree. But a few days I have learnt that this 
thing is heavily patented (at least in the USA, don't know 
about France...). An HT would be cool...but I'd rather 
choose another kind of representation than face the risk of 
dealing with lawyers instead of compilers. Compilers act 
much more reasonable than lawyers, although this might 
sound unlikely to a programmer which had not yet dealt with 
lawyers... ;-)
- Klaus
Reply Create subthread

All articles

Subject Author Date
A feature request on 'on-the-fly' blocking

11/12/2003 06:34
Re: A feature request on 'on-the-fly' blocking

11/16/2003 15:30
Re: A feature request on 'on-the-fly' blocking

11/19/2003 19:14
Re: A feature request on 'on-the-fly' blocking

11/19/2003 19:16


Created with FORUM 2.0.11