HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Exclude the end of each page from link-extraction?
Author: William Roeder
Date: 07/22/2009 01:11
 
> Is it possible to exclude the end of each page of a
> website from link-extraction?No such capability

> The reason is that the end of the pages include
> index pages which cannot be filtered out:
> Crawling over the index pages (see example: page4,
> page5) increases the amount of uninteresting data
> dramatically.
Filter out what you do not want.
-* +*/podo/*
Even if it spiders the index pages it won't go there.
> <href="www.site.org/podo/Page1">
> <href="www.site.org/podo/Page2">
> <href="www.site.org/podo/Page3">

Alternatively, if all the pages are reachable from the starting url, you can
set the depth so stuff beyond the index pages aren't allowed.
 
Reply Create subthread


All articles

Subject Author Date
Exclude the end of each page from link-extraction?

07/21/2009 22:12
Re: Exclude the end of each page from link-extraction?

07/22/2009 01:11
Re: Exclude the end of each page from link-extraction?

07/22/2009 15:24
Re: Exclude the end of each page from link-extraction?

07/22/2009 16:20




e

Created with FORUM 2.0.11