HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Exclude the end of each page from link-extraction?
Author: Bob
Date: 07/22/2009 15:24
 
Dear William,
i mean the following:

<html>
..Normal page..
<href="www.site.org/podo/Page1">        <- interesting link
<href="www.site.org/podo/Page2">        <- interesting link 
<href="www.site.org/podo/Page3">        <- interesting link 

<special tag indicating end of data>  <- stop crawling

<href="www.site.org/podo/Page4">      <- these links should
<href="www.site.org/podo/Page5">      <- not be included

Because the links which should not be included look the same as the
interesting links, they can not be filtered out.
Is it possible to process only a part of a page with the spider algorithm?
E.g. from <data> ... to </data> or skip the processing of the end of the page
after a special tag  e.g. <!-- end of data -->
 
Reply Create subthread


All articles

Subject Author Date
Exclude the end of each page from link-extraction?

07/21/2009 22:12
Re: Exclude the end of each page from link-extraction?

07/22/2009 01:11
Re: Exclude the end of each page from link-extraction?

07/22/2009 15:24
Re: Exclude the end of each page from link-extraction?

07/22/2009 16:20




a

Created with FORUM 2.0.11