HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: What are valid scan rule patterns?
Author: Xavier Roche
Date: 07/29/2003 19:02
 
> The scan rules aren't exactly regular expressions, 
they're 
> not shell globs... so what are all the valid patterns and 
> special characters that httrack uses? I've gone through 
> the manuals and lots of messages on the forums, here's 
> what I've gathered so far:

See also the small documentation:
<http://www.httrack.com/html/filters.html>

> [0-9] will match a single digit. I have no idea if it can 
> be done with letters like [a-zA-Z], and I don't think you 
> can specify repitition.

Yes, you can, as in
*[A-Za-z0-9]

> * acts like .* in a regular expression, not sure if it's 
> greedy or non-greedy, though <-- would really help out if 
> I knew for sure

Err what would it change on a matching? (isn't the 
difference only important in replacement?)

> appending *[<20>50] to a rule filters content less than 
> 20KB in size, and greater than 50KB. Just *[<20] would 
> mean filter less than 20KB.

Right.

> Are there other special characters (Like +)? 

No. * and *[<options>] are the only syntax.

> I did start expirementing to answer my questions, but 
> after trying some long scan rules and wasting time and 
> bandwidth, I'm still not sure exactly how these scan 
rules 
> are parsed. Tried looking at htsparse.c (169KB) not even 
> sure if that's the right file :(

Yes it is :)

> So does anyone know how exactly these scan rule patterns 
> work?
I should :)
 
Reply Create subthread


All articles

Subject Author Date
What are valid scan rule patterns?

07/29/2003 18:12
Re: What are valid scan rule patterns?

07/29/2003 19:02




a

Created with FORUM 2.0.11