| I am trying to use scan rules to reject literal asterisks in URLs, but after
several dozen attempts at various syntax I haven't yet found a solution. I'm
running httrack from a linux shell prompt. Here's an example command:
httrack\
<https://web.archive.org/web/19971014063324/http://www.dimatec.com/>
'-*'\
'+*/199710*/http*://www.dimatec.com/*'\
'-**[\*]*'\
-N1005\
--advanced-progressinfo\
--can-go-up-and-down\
--display\
--keep-alive\
--mirror\
--robots=0\
--user-agent='Mozilla/5.0 (X11;U; Linux i686; en-GB; rv:1.9.1)
Gecko/20090624 Ubuntu/9.04 (jaunty) Firefox/3.5'\
--verbose
Line 5 should filter out all URLs containing a literal asterisk, but I still
get mirrored pages from URLs like this:
web.archive.org/web/19971014063351*/http://www.dimatec.com/about.htm | |