| Dear William,
i mean the following:
<html>
..Normal page..
<href="www.site.org/podo/Page1"> <- interesting link
<href="www.site.org/podo/Page2"> <- interesting link
<href="www.site.org/podo/Page3"> <- interesting link
<special tag indicating end of data> <- stop crawling
<href="www.site.org/podo/Page4"> <- these links should
<href="www.site.org/podo/Page5"> <- not be included
Because the links which should not be included look the same as the
interesting links, they can not be filtered out.
Is it possible to process only a part of a page with the spider algorithm?
E.g. from <data> ... to </data> or skip the processing of the end of the page
after a special tag e.g. <!-- end of data --> | |