HTTrack Website Copier
Free software offline browser - FORUM
Subject: Suggestion: chosing an order
Author: Brian Schimmel
Date: 06/09/2003 01:34
 
In this forum i read the following:
  The crawler is descending all "layers", on a heap 
basis ; 
  that is, it takes ALL links that can be reached 
using "one 
  mouse click" from the primary urls (the addresses you 
typed 
  to crawl), then all links that can be reached using "two 
mouse clicks", and so on..

I guess there is something like a array which contains a 
list of all pages that are'nt still processed, and the 
crawler will take the link which is the "first" in that 
array, then the next, putting new links to the end of the 
array.

Im sure that this is best for most users, but for some 
reasons it might be better if httrack would take a random 
link out of that list, and then removing it from the list 
(which would certainly mean "mark it as processed" 
beacause removing it is difficult). When using external 
Links, that would work around problems with servers being 
pulled down by massive downloading. I guess in some cases 
the time needed to complete the task might also reduce.

Maybe that some non-random order other than the one used 
could also be useful for some users.

PS:
If i had some more time, i would take the code and do this 
on my own, but i haven't...
 
Reply


All articles

Subject Author Date
Suggestion: chosing an order

06/09/2003 01:34
Re: Suggestion: chosing an order

06/09/2003 10:00




0

Created with FORUM 2.0.11