HTTrack Website Copier
Free software offline browser - FORUM
Subject: Filter URLs with parentheses
Author: Andreas
Date: 08/27/2016 21:08
 
I would like to exclude all URLs that contain parentheses. So I tried
with this filter:

   -*\(*

Since I wrote this on a terminal with bash, the backslash escaped the
parenthesis for bash. Now httrack should not see the backslash as can
be seen from doit.log where it is written as:

   -*(*

Looks good to me, but doesn't work. URLs containing parentheses are
still included in the download.


Here is an example I used for testing:

httrack
<http://msdn.microsoft.com/en-us/library/office/documentformat.openxml.spreadsheet.author.aspx>
-*
+msdn.microsoft.com/en-us/library/office/documentformat.openxml.spreadsheet.author.*
-*\(*

File new.lst still lists files like

   [....spreadsheet.author.author(v=office.14).html]

that I didn't want to download.

How should a filter look like to exclude theses pages?
 
Reply


All articles

Subject Author Date
Filter URLs with parentheses

08/27/2016 21:08




3

Created with FORUM 2.0.11