HTTrack Website Copier
Free software offline browser - FORUM
Subject: literal asterisk in scan rule
Author: hg
Date: 07/05/2015 02:45
 
I am trying to use scan rules to reject literal asterisks in URLs, but after
several dozen attempts at various syntax I haven't yet found a solution. I'm
running httrack from a linux shell prompt. Here's an example command:

httrack\
    <https://web.archive.org/web/19971014063324/http://www.dimatec.com/>
    '-*'\
    '+*/199710*/http*://www.dimatec.com/*'\
    '-**[\*]*'\
    -N1005\
    --advanced-progressinfo\
    --can-go-up-and-down\
    --display\
    --keep-alive\
    --mirror\
    --robots=0\
    --user-agent='Mozilla/5.0 (X11;U; Linux i686; en-GB; rv:1.9.1)
Gecko/20090624 Ubuntu/9.04 (jaunty) Firefox/3.5'\
    --verbose

Line 5 should filter out all URLs containing a literal asterisk, but I still
get mirrored pages from URLs like this:

web.archive.org/web/19971014063351*/http://www.dimatec.com/about.htm
 
Reply


All articles

Subject Author Date
literal asterisk in scan rule

07/05/2015 02:45




3

Created with FORUM 2.0.11