| > Thanks for the reply,
> the mirror isn't ending as the total number of links
> scanned is increasing all the time. Well its true
that
> I shall set a bigger number for bigger sites,but its
> hard to gauge the depth as there will be missing
> pictures or pages.
Again, the BEST way is to detect why you get so many
pages. Generally, with 'infinite' level, a mirror
should be okay (because there isn't an infinite number
of links in the website)
The problem here - I think - might be due to dynamic
pages which generate dynamic links -always different-,
and forcing the engine to catch new links forever.
Interrupt the mirror (after having downloaded many
links), and look in the project folder, clicking on
each folders to try to detect WHERE are all these
numerous links. Then you will he able to fodbide this
specific URL.
Example:
Imagine you mirrored www.foobar.com/smith/ in "Project
1"
Go to C:\My Web Sites\Project 1
Go to www.foobar.com and click on "smith", and ask for
properties. You may see a huge number of files.
Go into this folder - if you see hundreds of files
with almost same names (bar45F1.html,
bar4587.html...), these files might be responsible for
your problems. Else, in this subfolder, you will see
several folders - check the number of files inside
them, and go into the one which has the buggest number.
Redo these steps until you reach the "numerous" files
with almost the same names.
At this point, you'll be in folder - say
"Project 1\www.foobar.com\smith\cgi\pages\" and you'll
see many files "barXXX.html"
Then, you're going to exclude all barXXX files:
restart the project, and in Options/Filters, add:
-www.foobar.com/smith/cgi/pages/bar*
(note the / instead of \)
You may reach several paths to exclude, in this case
add several rules.
| |