| In crawlers, one can specify a number of links that
will be followed during the extraction of the content
from the Internet. This means that starting from the
first web page of a web site, the crawler will cache
on the local disk all the files linked to this page
and will do exactly the same process while the level
isn't reached. When the level is reached, none of
the links are saved.
From an end-user point of view, the notion of "level
of links" is different. The level of links is the
number of times he can click on valid links starting
from the first page. Moreover, the end-user expects
that all HTML pages he will see will be
consistent, which basically means that all images
should be included in the page and that all frames
should be displayed.
How WebCopier (command line or GUI version) handles
this problem ? Is it possible to use some advanced
features in order to address this problem ? | |