HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Option --near does not respect filters
Author: Alexander
Date: 11/25/2012 20:20
 
It seems that filters do not work correctly if they are written in the command
line. They only work when included in a file specified with option -%S.

For example, let we need to download only html content of a site.

1. httrack site '-* +*.html' -O site
All site content is downloaded including non-html resources. Filter rules are
not respected.

2. httrack site -O site -%S filters
The content of file filters
-*
+*.html

Only html resources are downloaded. Filter rules are respected.

I tried using .httrackrc with url scan rules included, but they were ignored.
The content of .httrackrc:
deny *
allow *.html

Command line:
httrack site -O site

All site content is downloaded including non-html resources. Filter rules are
not respected. File .httrackrc is located in the current directory. Another
try was the directory specified with option -O, but without success either.

May be I am doing something wrong, but unfortunately httrack has poor
documentation. Many specific things are not described such as .httrackrc
syntax, filter usage in the command line, and sence of numerous options. In
contrast, wget has an exellent tutorial however it does not have so many
mirroring features as httrack does.

One more improvement would be the possibility of using standard regular
expression in url patterns instead of specific syntax now.
 
Reply Create subthread


All articles

Subject Author Date
Option --near does not respect filters

11/25/2012 00:37
Re: Option --near does not respect filters

11/25/2012 20:20
Re: Option --near does not respect filters

11/25/2012 20:27
Re: Option --near does not respect filters

11/26/2012 18:36




9

Created with FORUM 2.0.11