| Hi, I am trying to run HTTrack as a batch job, but I fail...
I used the windows version (which is the version I
regularely use) to check my configuration and then went
through the command line guide to make up the command line
options.
The windows configuration is default except for the
following:
using a file that consists of the start URL's
filtered files:+*.css +*.js -ad.doubleclick.net/*
-*.gif -*.jpg -*.png -*.tif -*.bmp -*.jpeg
-*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -*.rm -
*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv
max link depth:2
external depth:0
number of connections:12
no searchable index
browser id:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.0)
I made up the following command line composition:
C:\Program\WinHTTrack\httrack -%L
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt> -O
G:\vattenfall\Webmaterial\d2b0\ -Zr2c12o0s2qI0 %e0 --assume
php3=text/html,php=text/html,php2=text/html,asp=text/html,js
p=text/html,pl=text/html,cfm=text/html -F "Mozilla/4.78
[en] (Windows NT 5.0; U)" "+*.css" "+*.js" "-
ad.doubleclick.net/*" "-*.gif" "-*.jpg" "-*.png" "-*.tif" "-
*.bmp" "-*.jpeg" "-*.mov" "-*.mpg" "-*.mpeg" "-*.avi" "-
*.asf" "-*.mp3" "-*.mp2" "-*.rm" "-*.wav" "-*.vob" "-
*.qt" "-*.vid" "-*.ac3" "-*.wma" "-*.wmv"
But it is resulting in the following log
HTTrack3.22-3-noV6+swf launched on Wed, 28 May 2003
14:17:53 at %e0 +*.css +*.js -ad.doubleclick.net/* -*.gif -
*.jpg -*.png -*.tif -*.bmp -*.jpeg -*.mov -*.mpg -*.mpeg -
*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -*.qt -
*.vid -*.ac3 -*.wma -*.wmv
(C:\Program\WinHTTrack\httrack -%L
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt> -O
G:\vattenfall\Webmaterial\d2b0\ -Zr2c12o0s2qI0 %e0 -%A
php3=text/html,php=text/html,php2=text/html,asp=text/html,js
p=text/html,pl=text/html,cfm=text/html -F "Mozilla/4.78
[en] (Windows NT 5.0; U)" +*.css +*.js -
ad.doubleclick.net/* -*.gif -*.jpg -*.png -*.tif -*.bmp -
*.jpeg -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -
*.rm -*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may
contain sensitive information,
such as username/password authentication for
websites mirrored in this project
do not share these files/folders if you want these
information to remain private
14:17:53 Info: engine: init
14:17:53 Error: Could not include URL list:
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt>
14:17:53 Info: engine: start
14:17:53 Debug: Wait get: primary/primary
14:17:53 Info: engine: check-html: primary/primary
14:17:53 Debug: scan file..
14:17:53 Debug: link detected in html: <http://%e0>
14:17:53 Debug: position link check <http://%e0>
14:17:53 Debug: build relative link <http://%e0> with
primary/primary
14:17:53 Debug: wizard link test at %e0/..
14:17:53 Debug: wizard test begins: %e0/
14:17:53 Debug: Compare addresses: %e0!=primary
14:17:53 Debug: result for wizard link test: 0
14:17:53 Info: engine: save-name: local name: %
e0/index.html -> %e0/index.html
14:17:53 Debug: Record: %e0/ ->
G:/vattenfall/Webmaterial/d2b0/%e0/index.html
14:17:53 Debug: relative link at %e0 build with
G:/vattenfall/Webmaterial/d2b0/%e0/index.html and
G:/vattenfall/Webmaterial/d2b0/index.html: %e0/index.html
14:17:53 Debug: robots.txt added at %e0
14:17:53 Debug: OK, NOTE: %e0/ ->
G:/vattenfall/Webmaterial/d2b0/%e0/index.html
14:17:53 Debug: Wait get: %e0/robots.txt
14:17:58 Error: "Unable to get server's address" (-
5) after 2 retries at link %e0/robots.txt (from
primary/primary)
14:17:58 Debug: Wait get: %e0/
14:17:58 Warning: Retry after error -5
(Unable to get server's address) at link %e0/ (from
primary/primary)
14:17:58 Debug: Wait get: %e0/
14:18:00 Warning: Retry after error -5
(Unable to get server's address) at link %e0/ (from
primary/primary)
14:18:00 Debug: Wait get: %e0/
14:18:02 Error: "Unable to get server's address" (-
5) after 2 retries at link %e0/ (from primary/primary)
14:18:02 Info: No data seems to have been
transfered during this session! : restoring previous one!
14:18:02 Info: engine: end
14:18:02 Info: engine: free
The I tried to run the window version (with perfect result)
and cut the command line result (from the hts-log)
it looks like this:
C:\Program\WinHTTrack\httrack -qwr2C2%P%sI0%I0c12H0f2#f -
F "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -%
F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x
[XR&CO'2002], %s -->" -%l "sv, en, *" -%L
D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2b0.txt -
O "G:\vattenfall\Webmaterial\vfalld2b0","G:\vattenfall\Webma
terial\vfalld2b0" +*.css +*.js -ad.doubleclick.net/* -
*.gif -*.jpg -*.png -*.tif -*.bmp -*.jpeg -*.mov -*.mpg -
*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -
*.qt -*.vid -*.ac3 -*.wma -*.wmv -%A
php3,php,php2,asp,jsp,pl,cfm=text/html
and is resulting in this log:
HTTrack3.22-3-noV6+swf launched on Wed, 28 May 2003
14:17:53 at %e0 +*.css +*.js -ad.doubleclick.net/* -*.gif -
*.jpg -*.png -*.tif -*.bmp -*.jpeg -*.mov -*.mpg -*.mpeg -
*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -*.qt -
*.vid -*.ac3 -*.wma -*.wmv
(C:\Program\WinHTTrack\httrack -%L
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt> -O
G:\vattenfall\Webmaterial\d2b0\ -Zr2c12o0s2qI0 %e0 -%A
php3=text/html,php=text/html,php2=text/html,asp=text/html,js
p=text/html,pl=text/html,cfm=text/html -F "Mozilla/4.78
[en] (Windows NT 5.0; U)" +*.css +*.js -
ad.doubleclick.net/* -*.gif -*.jpg -*.png -*.tif -*.bmp -
*.jpeg -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -
*.rm -*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may
contain sensitive information,
such as username/password authentication for
websites mirrored in this project
do not share these files/folders if you want these
information to remain private
14:17:53 Info: engine: init
14:17:53 Error: Could not include URL list:
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt>
14:17:53 Info: engine: start
14:17:53 Debug: Wait get: primary/primary
14:17:53 Info: engine: check-html: primary/primary
14:17:53 Debug: scan file..
14:17:53 Debug: link detected in html: <http://%e0>
14:17:53 Debug: position link check <http://%e0>
14:17:53 Debug: build relative link <http://%e0> with
primary/primary
14:17:53 Debug: wizard link test at %e0/..
14:17:53 Debug: wizard test begins: %e0/
14:17:53 Debug: Compare addresses: %e0!=primary
14:17:53 Debug: result for wizard link test: 0
14:17:53 Info: engine: save-name: local name: %
e0/index.html -> %e0/index.html
14:17:53 Debug: Record: %e0/ ->
G:/vattenfall/Webmaterial/d2b0/%e0/index.html
14:17:53 Debug: relative link at %e0 build with
G:/vattenfall/Webmaterial/d2b0/%e0/index.html and
G:/vattenfall/Webmaterial/d2b0/index.html: %e0/index.html
14:17:53 Debug: robots.txt added at %e0
14:17:53 Debug: OK, NOTE: %e0/ ->
G:/vattenfall/Webmaterial/d2b0/%e0/index.html
14:17:53 Debug: Wait get: %e0/robots.txt
14:17:58 Error: "Unable to get server's address" (-
5) after 2 retries at link %e0/robots.txt (from
primary/primary)
14:17:58 Debug: Wait get: %e0/
14:17:58 Warning: Retry after error -5
(Unable to get server's address) at link %e0/ (from
primary/primary)
14:17:58 Debug: Wait get: %e0/
14:18:00 Warning: Retry after error -5
(Unable to get server's address) at link %e0/ (from
primary/primary)
14:18:00 Debug: Wait get: %e0/
14:18:02 Error: "Unable to get server's address" (-
5) after 2 retries at link %e0/ (from primary/primary)
14:18:02 Info: No data seems to have been
transfered during this session! : restoring previous one!
14:18:02 Info: engine: end
14:18:02 Info: engine: free
The file referred to as ur list looks like this:
<http://www.planetark.org/searchresults.cfm?criteria=emission+trading&sortorder=rel&showweeks=-20>
<http://www.regeringen.se/search97cgi/s97_cgi?Action=Search&ResultCount=20&ResultTemplate=inetstd>-
rk.hts&Querymode=Internet&collection=Finansdepartementet&col
lection=Milj%F6departementet&collection=N%
E4ringsdepartementet&QueryText=%22emission+trading%
22&I3.x=11&I3.y=7
<http://www.regeringen.se/search97cgi/s97_cgi?Action=Search&ResultCount=20&ResultTemplate=inetstd>-
rk.hts&Querymode=Internet&collection=Finansdepartementet&col
lection=Milj%F6departementet&collection=N%
E4ringsdepartementet&QueryText=%22handel+med+utsl%E4ppsr%
E4ttigheter%22&I3.x=13&I3.y=10
<http://www4.stem.se/web/pressmapp.nsf/aktuellapressmedd?openview&count=999>
I desperately need advice on how to manage to download
those url's from the command line.
Thank you /Lars
| |