| I'm having some trouble with a fairly specialized filter I'm trying to create.
I'm using httrack to crawl a site that has a calendar app as part of it's
dynamically generated content, so that potentially will crawl forever. I want
to exclude the date-specific links while including certain others.
All files are in the /spaces/usage directory as a function of the
report.action script. I want to throw away all links here where &date is part
of the pattern. I should note that higher up in my filters I'm excluding
/spaces via -*/spaces/* to exclude lots of other junk. Right now my pattern
hierarchy looks like this (from a batch file, so you'll see some escaping
going on):
-*/spaces/*
+'*/spaces/usage/report.action?key=*[A-Z,a-z,0-9]*[]'
+'*/spaces/usage/report.action?key=*[A-Z,a-z,0-9]^&period=*[a-z]*[]'
So I want key= or key= and period=, but I do NOT want anything else, like
key=, period=, and date=. Right now this does not appear to be matching
ANYTHING in the /spaces/usage directory. Any ideas? | |