| [darn, the database is really fked up..]
> When httrack pulls the first file, does it then traverse
> the tree on the web site visiting all the top level links
> first, then second level links, or does it follow all the
> way down the tree first until it reaches the boundary of
> the site, then comes back up one level.
The crawler is descending all "layers", on a heap basis ;
that is, it takes ALL links that can be reached using "one
mouse click" from the primary urls (the addresses you typed
to crawl), then all links that can be reached using "two
mouse clicks", and so on..
Of course, depending on the site structure, it can make
behaviours you wouldn't have imagined (for example, you can
go back to "upper" structures using "top" links, or the
engine can also use links not generally used, because
hidden or written in very small font size..)
Anyway, this behaviour is generally the one which is
desired.
| |