| > In your case, what you want to do is somethink like
> mirroring several sites, as if you were mirroring
the
> next site at the end of the current one.
This would be ok if it only was about a limited pre-
known number of websites, but to use HTTRack as a
spider (go everywhere, also out from the original
links) it is of no help.
Currently I don't know of any consumer-oriented
spidering program with serious capabilities. Teleport
has a version that can be configured for extended
spidering, costs several thousands USD(!).
HTTRack is very nice, but with a large link list
(>10,000 links) it takes forever to get any result.
With an option like I described in the first msg, one
would both get immediate results and also get all the
outlinks later, after all original links scanned.
The causality would be easy to do in a limited fashion
as some kind of attribute in the filters section,
however maybe not many people have need for this, so
could be effort wasted.
But I think there could be a need for various
spidering functions? Maybe there could be two options,
layer-scan and depth-scan, layer-scan like now and
depth-scan would go each tree branch until the end
before starting any other branch. Horizontal vs
Vertical. Whatever.
It would also help scanning large single sites, as
there would be faster end results going thru huge
linktrees straight to the bottom.
(Options like this would make HTTRack more widely
usable, as it would broaden the scope from simple
website mirroring/copying to spidering and finding.)
| |