HTTrack Website Copier
Free software offline browser - FORUM
Subject: What are valid scan rule patterns?
Author: loko
Date: 07/29/2003 18:12
 
The scan rules aren't exactly regular expressions, they're 
not shell globs... so what are all the valid patterns and 
special characters that httrack uses? I've gone through 
the manuals and lots of messages on the forums, here's 
what I've gathered so far:
[0-9] will match a single digit. I have no idea if it can 
be done with letters like [a-zA-Z], and I don't think you 
can specify repitition.
* acts like .* in a regular expression, not sure if it's 
greedy or non-greedy, though <-- would really help out if 
I knew for sure
appending *[<20>50] to a rule filters content less than 
20KB in size, and greater than 50KB. Just *[<20] would 
mean filter less than 20KB.
and that's about all I've found, which leaves me with some 
more questions:
Are there other special characters (Like +)? How do I 
embed them in a scan rule? I thought of using URL submit 
encoding, like "," -> "%2C" but I'm not sure if they will 
match each other (Do they?).
I did start expirementing to answer my questions, but 
after trying some long scan rules and wasting time and 
bandwidth, I'm still not sure exactly how these scan rules 
are parsed. Tried looking at htsparse.c (169KB) not even 
sure if that's the right file :(
So does anyone know how exactly these scan rule patterns 
work?Thanks!
 
Reply


All articles

Subject Author Date
What are valid scan rule patterns?

07/29/2003 18:12
Re: What are valid scan rule patterns?

07/29/2003 19:02




8

Created with FORUM 2.0.11