HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Feature?
Author: Xavier Roche
Date: 11/17/2002 10:55
 
>   Anyway, I think this would be good for filtering out 
> unwanted webpages. Say you wanted Httrack to copy 
> Www.blender.org, but you only wanted it to copy the pages 
> that contain the word blender. HTTrack would go to the 
> page, download it, search for the word blender, if it 
wasnt 
> there, it would delete the page and not come back, nor 
> would it download any files from that page or follow any 
> links from that page.

This would not give very good results ; imagine you filter 
using the "blender" word - you'll have a page where this 
word won't appear, and then the engine would drop it.
But what if a page linked from this page contains the word 
blender? It then will be missing..

I don't think I'll implement this, but anyway this can be 
easily acheived using the httrack library and the check-
html wrapper:

..
htswrap_add("check-html",httrack_wrapper_checkhtml);

..
int CDECL httrack_wrapper_checkhtml(char* html,int 
len,char* url_adresse,char* url_fichier) {
return (strstr(html, "blender") != NULL) ? 1 : 0;
}

 
Reply Create subthread


All articles

Subject Author Date
Feature?

11/17/2002 02:44
Re: Feature?

11/17/2002 02:50
Re: Feature?

11/17/2002 10:55
Re: Feature?

11/17/2002 22:13




2

Created with FORUM 2.0.11