| It'd be nice if there were more options to control the
behavior of the spidering function. With a large link
list, it would be better to be able to scan one site
wholly, before going to the next site in list.
Together combined with "go everywhere on the web" it
would form a nice behavior: first go to the first link
on the list, download it fully, and put any external
links found to the bottom of the spidering stack.
Then go to the next and do the same. When all original
links scanned, start from the links found under the
1st original site etc.
Without this, with a large link list to scan, there
may be a damn long wait before anything of interest is
actually pulled form the web, as the spider scans 1
level at a time from each link... :(
(Another improvement on this would be a causal scan,
ie IF found files *.jpg[>20] THEN go deeper on this
site ELSE go next dirtree/site)
| |