HTTrack Website Copier
Free software offline browser - FORUM
Subject: filters + robots.txt on command line
Author: Andreas
Date: 04/26/2004 09:13
 
Hello!

I don't understand how the s-Option on httrack 3.30 (command
line) works: For example I want to mirror
www.bsz-bw.de/index.html. I only want to get this page
nothing else; I need error pages and want to get the
pictures in this page. If I call 
httrack -x -%e0 -r1 -n 'www.bsz-bw.de/index.html'
I don't get the picturs, because they are excluded by
robots.txt:
Info:   Note: due to www.bsz-bw.de remote robots.txt rules,
links begining with these path will be forbidden: (...)
/wwwroot/gif_01/, /wwwroot/gif_02/ (see in the options to
disable this)

So I have to set -s0 or -s1 (and don't understand the
difference by the way):
httrack -x -%e0 -r1 -n -s0 'www.bsz-bw.de/index.html'
But here still the pictures are missing - no errors or infos
in the log.
This gets the pictures - but I don't want the r2-Option:
httrack -x -%e0 -r2 -n -s0 'www.bsz-bw.de/index.html'
So I extend the command by some positive filters instead of
the r2-Option. Here just one filter for one of the pictures:
httrack -x -%e0 -r1 -n -s0 'www.bsz-bw.de/index.html'
+http://www.bsz-bw.de/wwwroot/gif_01/bszlogo60.gif

But I still don't get any picture! Now I don't have an idea
what to try more. Can you help?I put the log below.

Thanks, Andreas

-----

cat hts-log.txt
HTTrack3.30 launched on Mon, 26 Apr 2004 09:40:42 at
www.bsz-bw.de/index.html
+http://www.bsz-bw.de/wwwroot/gif_01/bszlogo60.gif
(httrack -x -%e0 -r1 -n -s0 www.bsz-bw.de/index.html
+http://www.bsz-bw.de/wwwroot/gif_01/bszlogo60.gif )

Information, Warnings and Errors reported for this mirror:
note:   the hts-log.txt file, and hts-cache folder, may
contain sensitive information,
        such as username/password authentication for
websites mirrored in this project
        do not share these files/folders if you want these
information to remain private


HTTrack mirror complete in 0 seconds : 1 links scanned, 1
files written (8313 bytes overall) [8488 bytes received at
8488 bytes/sec]
(No errors, 0 warnings, 0 messages)

 
Reply


All articles

Subject Author Date
filters + robots.txt on command line

04/26/2004 09:13




0

Created with FORUM 2.0.11