HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: HTTRACK and regular expressions in URLs?
Author: William Roeder
Date: 11/27/2007 15:00
 
> -*
> +www.morgenpost.de/content/200*[2-7]/*[0-1]*[0-9]/*[
> 0-3]*[0-9]/*index.html
> 
> so what i'm trying to express is that urls are
> ordered after year/month/day/politik/

Your filter only allows content/y/m/d/* it doesnt allow content/y/index.html
for instance. Also note that intermediate pages are not necessarily called
index.html

You need to allow httrack to scan down all the directories to get all the urls
(y,m,d) and then filter out everything but the politik.  Try
+*/content/*
-*/content/*/*/*/*/*
+*/politik/*

<http://www.httrack.com/html/fcguide.html> lists:
The full syntax for filters follows:
* 	any characters (the most commonly used)
*[file] or *[name] 	any filename or name, e.g. not /,? and ; characters
*[path] 	any path (and filename), e.g. not ? and ; characters
*[a,z,e,r,t,y] 	any letters among a,z,e,r,t,y
*[a-z] 	any letters
*[0-9,a,z,e,r,t,y] 	any characters among 0..9 and a,z,e,r,t,y
*[] 	no characters must be present after
*[< NN] 	size less than NN Kbytes
*[> PP] 	size more than PP Kbytes
*[< NN > PP] 	size less than NN Kbytes and more than PP Kbytes
 
Reply Create subthread


All articles

Subject Author Date
HTTRACK and regular expressions in URLs?

11/27/2007 12:39
Re: HTTRACK and regular expressions in URLs?

11/27/2007 15:00
Re: HTTRACK and regular expressions in URLs?

11/28/2007 22:03




7

Created with FORUM 2.0.11