| It seems that filters do not work correctly if they are written in the command
line. They only work when included in a file specified with option -%S.
For example, let we need to download only html content of a site.
1. httrack site '-* +*.html' -O site
All site content is downloaded including non-html resources. Filter rules are
not respected.
2. httrack site -O site -%S filters
The content of file filters
-*
+*.html
Only html resources are downloaded. Filter rules are respected.
I tried using .httrackrc with url scan rules included, but they were ignored.
The content of .httrackrc:
deny *
allow *.html
Command line:
httrack site -O site
All site content is downloaded including non-html resources. Filter rules are
not respected. File .httrackrc is located in the current directory. Another
try was the directory specified with option -O, but without success either.
May be I am doing something wrong, but unfortunately httrack has poor
documentation. Many specific things are not described such as .httrackrc
syntax, filter usage in the command line, and sence of numerous options. In
contrast, wget has an exellent tutorial however it does not have so many
mirroring features as httrack does.
One more improvement would be the possibility of using standard regular
expression in url patterns instead of specific syntax now. | |