| Is it possible to exclude the end of each page of a website from
link-extraction?The reason is that the end of the pages include index pages
which cannot be filtered out:
Crawling over the index pages (see example: page4, page5) increases the amount
of uninteresting data dramatically.
<html>
..Normal page..
<href="www.site.org/podo/Page1">
<href="www.site.org/podo/Page2">
<href="www.site.org/podo/Page3">
<special tag indicating end of data>
<href="www.site.org/podo/Page4">
<href="www.site.org/podo/Page5">
Is there a solution to this?
Bob | |