| > I an trying to scan a site hosted on IBM/Lotus
Domino.
>
> Most of the pages will have a link back to home
page,
> and the home page is a frame page contain three
> dynamically generated pages, one of it just a
counter.
>
> When I download this page, it will repeatly scan
this
> homepage, and all threads were waiting for that. So
> if this site have 10,000 pages, this pages might
have
> download and scan 10,000 times.
If the homepage link can be skipped, use filters:
-www.foo.com/bar/homepage.cgi*
If you have to get this homepage, except when the
counter variable is used (I don't know exacly, but
this may be something like that):
-www.foo.com/bar/homepage.cgi?*counter=*
If you have to get the homepage, only for a specific
counter value (example: 1) :
-www.foo.com/bar/homepage.cgi?*counter=*
+www.foo.com/bar/homepage.cgi?*counter=1&*
Note: if the homepage is the #1 page, given in the URL
list, you can safely use the first filter which
exclude it, as all URLs given as "starting pages" are
taken whatever the filters are.
| |