| > The scan rules aren't exactly regular expressions,
they're
> not shell globs... so what are all the valid patterns and
> special characters that httrack uses? I've gone through
> the manuals and lots of messages on the forums, here's
> what I've gathered so far:
See also the small documentation:
<http://www.httrack.com/html/filters.html>
> [0-9] will match a single digit. I have no idea if it can
> be done with letters like [a-zA-Z], and I don't think you
> can specify repitition.
Yes, you can, as in
*[A-Za-z0-9]
> * acts like .* in a regular expression, not sure if it's
> greedy or non-greedy, though <-- would really help out if
> I knew for sure
Err what would it change on a matching? (isn't the
difference only important in replacement?)
> appending *[<20>50] to a rule filters content less than
> 20KB in size, and greater than 50KB. Just *[<20] would
> mean filter less than 20KB.
Right.
> Are there other special characters (Like +)?
No. * and *[<options>] are the only syntax.
> I did start expirementing to answer my questions, but
> after trying some long scan rules and wasting time and
> bandwidth, I'm still not sure exactly how these scan
rules
> are parsed. Tried looking at htsparse.c (169KB) not even
> sure if that's the right file :(
Yes it is :)
> So does anyone know how exactly these scan rule patterns
> work?
I should :)
| |