HTTrack Website Copier
Free software offline browser - FORUM
Subject: Specialized filter
Author: Dave
Date: 02/01/2008 20:42
 
I'm having some trouble with a fairly specialized filter I'm trying to create.

I'm using httrack to crawl a site that has a calendar app as part of it's
dynamically generated content, so that potentially will crawl forever.  I want
to exclude the date-specific links while including certain others.

All files are in the /spaces/usage directory as a function of the
report.action script.  I want to throw away all links here where &date is part
of the pattern.  I should note that higher up in my filters I'm excluding
/spaces via -*/spaces/* to exclude lots of other junk.  Right now my pattern
hierarchy looks like this (from a batch file, so you'll see some escaping
going on):

-*/spaces/*
+'*/spaces/usage/report.action?key=*[A-Z,a-z,0-9]*[]'
+'*/spaces/usage/report.action?key=*[A-Z,a-z,0-9]^&period=*[a-z]*[]'

So I want key= or key= and period=, but I do NOT want anything else, like
key=, period=, and date=.  Right now this does not appear to be matching
ANYTHING in the /spaces/usage directory.  Any ideas?
 
Reply


All articles

Subject Author Date
Specialized filter

02/01/2008 20:42
Re: Specialized filter

02/02/2008 15:45
Re: Specialized filter

02/04/2008 14:32




7

Created with FORUM 2.0.11