HTTrack Website Copier
Free software offline browser - FORUM
Subject: Exclude the end of each page from link-extraction?
Author: Bob
Date: 07/21/2009 22:12
 
Is it possible to exclude the end of each page of a website from
link-extraction?The reason is that the end of the pages include index pages
which cannot be filtered out:
Crawling over the index pages (see example: page4, page5) increases the amount
of uninteresting data dramatically.


<html>
..Normal page..
<href="www.site.org/podo/Page1">
<href="www.site.org/podo/Page2">
<href="www.site.org/podo/Page3">

<special tag indicating end of data>

<href="www.site.org/podo/Page4">
<href="www.site.org/podo/Page5">


Is there a solution to this?

Bob
 
Reply


All articles

Subject Author Date
Exclude the end of each page from link-extraction?

07/21/2009 22:12
Re: Exclude the end of each page from link-extraction?

07/22/2009 01:11
Re: Exclude the end of each page from link-extraction?

07/22/2009 15:24
Re: Exclude the end of each page from link-extraction?

07/22/2009 16:20




f

Created with FORUM 2.0.11