| Hello!
I don't understand how the s-Option on httrack 3.30 (command
line) works: For example I want to mirror
www.bsz-bw.de/index.html. I only want to get this page
nothing else; I need error pages and want to get the
pictures in this page. If I call
httrack -x -%e0 -r1 -n 'www.bsz-bw.de/index.html'
I don't get the picturs, because they are excluded by
robots.txt:
Info: Note: due to www.bsz-bw.de remote robots.txt rules,
links begining with these path will be forbidden: (...)
/wwwroot/gif_01/, /wwwroot/gif_02/ (see in the options to
disable this)
So I have to set -s0 or -s1 (and don't understand the
difference by the way):
httrack -x -%e0 -r1 -n -s0 'www.bsz-bw.de/index.html'
But here still the pictures are missing - no errors or infos
in the log.
This gets the pictures - but I don't want the r2-Option:
httrack -x -%e0 -r2 -n -s0 'www.bsz-bw.de/index.html'
So I extend the command by some positive filters instead of
the r2-Option. Here just one filter for one of the pictures:
httrack -x -%e0 -r1 -n -s0 'www.bsz-bw.de/index.html'
+http://www.bsz-bw.de/wwwroot/gif_01/bszlogo60.gif
But I still don't get any picture! Now I don't have an idea
what to try more. Can you help?I put the log below.
Thanks, Andreas
-----
cat hts-log.txt
HTTrack3.30 launched on Mon, 26 Apr 2004 09:40:42 at
www.bsz-bw.de/index.html
+http://www.bsz-bw.de/wwwroot/gif_01/bszlogo60.gif
(httrack -x -%e0 -r1 -n -s0 www.bsz-bw.de/index.html
+http://www.bsz-bw.de/wwwroot/gif_01/bszlogo60.gif )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may
contain sensitive information,
such as username/password authentication for
websites mirrored in this project
do not share these files/folders if you want these
information to remain private
HTTrack mirror complete in 0 seconds : 1 links scanned, 1
files written (8313 bytes overall) [8488 bytes received at
8488 bytes/sec]
(No errors, 0 warnings, 0 messages)
| |