| > Anyway, I think this would be good for filtering out
> unwanted webpages. Say you wanted Httrack to copy
> Www.blender.org, but you only wanted it to copy the pages
> that contain the word blender. HTTrack would go to the
> page, download it, search for the word blender, if it
wasnt
> there, it would delete the page and not come back, nor
> would it download any files from that page or follow any
> links from that page.
This would not give very good results ; imagine you filter
using the "blender" word - you'll have a page where this
word won't appear, and then the engine would drop it.
But what if a page linked from this page contains the word
blender? It then will be missing..
I don't think I'll implement this, but anyway this can be
easily acheived using the httrack library and the check-
html wrapper:
..
htswrap_add("check-html",httrack_wrapper_checkhtml);
..
int CDECL httrack_wrapper_checkhtml(char* html,int
len,char* url_adresse,char* url_fichier) {
return (strstr(html, "blender") != NULL) ? 1 : 0;
}
| |