HTTrack Website Copier
Free software offline browser - FORUM
Subject: WINHHTrack vs command line version
Author: Lars Niia
Date: 05/28/2003 14:27
 
Hi, I am trying to run HTTrack as a batch job, but I fail...
I used the windows version (which is the version I 
regularely use) to check my configuration and then went 
through the command line guide to make up the command line 
options.
The windows configuration is default except for the 
following:

using a file that consists of the start URL's

filtered files:+*.css +*.js -ad.doubleclick.net/*
-*.gif -*.jpg -*.png -*.tif -*.bmp -*.jpeg
-*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -*.rm -
*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv

max link depth:2
external depth:0
number of connections:12
no searchable index
browser id:Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 
5.0)

I made up the following command line composition:
C:\Program\WinHTTrack\httrack -%L 
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt> -O 
G:\vattenfall\Webmaterial\d2b0\ -Zr2c12o0s2qI0 %e0 --assume 
php3=text/html,php=text/html,php2=text/html,asp=text/html,js
p=text/html,pl=text/html,cfm=text/html -F "Mozilla/4.78 
[en] (Windows NT 5.0; U)" "+*.css" "+*.js" "-
ad.doubleclick.net/*" "-*.gif" "-*.jpg" "-*.png" "-*.tif" "-
*.bmp" "-*.jpeg" "-*.mov" "-*.mpg" "-*.mpeg" "-*.avi" "-
*.asf" "-*.mp3" "-*.mp2" "-*.rm" "-*.wav" "-*.vob" "-
*.qt" "-*.vid" "-*.ac3" "-*.wma" "-*.wmv"

But it is resulting in the following log
HTTrack3.22-3-noV6+swf launched on Wed, 28 May 2003 
14:17:53 at %e0 +*.css +*.js -ad.doubleclick.net/* -*.gif -
*.jpg -*.png -*.tif -*.bmp -*.jpeg -*.mov -*.mpg -*.mpeg -
*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -*.qt -
*.vid -*.ac3 -*.wma -*.wmv

(C:\Program\WinHTTrack\httrack -%L 
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt> -O 
G:\vattenfall\Webmaterial\d2b0\ -Zr2c12o0s2qI0 %e0 -%A 
php3=text/html,php=text/html,php2=text/html,asp=text/html,js
p=text/html,pl=text/html,cfm=text/html -F "Mozilla/4.78 
[en] (Windows NT 5.0; U)" +*.css +*.js -
ad.doubleclick.net/* -*.gif -*.jpg -*.png -*.tif -*.bmp -
*.jpeg -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -
*.rm -*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv )



Information, Warnings and Errors reported for this mirror:

note:	the hts-log.txt file, and hts-cache folder, may 
contain sensitive information,

	such as username/password authentication for 
websites mirrored in this project

	do not share these files/folders if you want these 
information to remain private



14:17:53	Info: 	engine: init

14:17:53	Error: 	Could not include URL list: 
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt>

14:17:53	Info: 	engine: start

14:17:53	Debug: 	Wait get: primary/primary

14:17:53	Info: 	engine: check-html: primary/primary

14:17:53	Debug: 	scan file..

14:17:53	Debug: 	link detected in html: <http://%e0>

14:17:53	Debug: 	position link check <http://%e0>

14:17:53	Debug: 	build relative link <http://%e0> with 
primary/primary

14:17:53	Debug: 	wizard link test at %e0/..

14:17:53	Debug: 	wizard test begins: %e0/

14:17:53	Debug: 	Compare addresses: %e0!=primary

14:17:53	Debug: 	result for wizard link test: 0

14:17:53	Info: 	engine: save-name: local name: %
e0/index.html -> %e0/index.html

14:17:53	Debug: 	Record: %e0/ -> 
G:/vattenfall/Webmaterial/d2b0/%e0/index.html

14:17:53	Debug: 	relative link at %e0 build with 
G:/vattenfall/Webmaterial/d2b0/%e0/index.html and 
G:/vattenfall/Webmaterial/d2b0/index.html: %e0/index.html

14:17:53	Debug: 	robots.txt added at %e0

14:17:53	Debug: 	OK, NOTE: %e0/ -> 
G:/vattenfall/Webmaterial/d2b0/%e0/index.html

14:17:53	Debug: 	Wait get: %e0/robots.txt

14:17:58	Error: 	"Unable to get server's address" (-
5) after 2 retries at link %e0/robots.txt (from 
primary/primary)

14:17:58	Debug: 	Wait get: %e0/

14:17:58	Warning: 	Retry after error -5 
(Unable to get server's address) at link %e0/ (from 
primary/primary)

14:17:58	Debug: 	Wait get: %e0/

14:18:00	Warning: 	Retry after error -5 
(Unable to get server's address) at link %e0/ (from 
primary/primary)

14:18:00	Debug: 	Wait get: %e0/

14:18:02	Error: 	"Unable to get server's address" (-
5) after 2 retries at link %e0/ (from primary/primary)

14:18:02	Info: 	No data seems to have been 
transfered during this session! : restoring previous one!

14:18:02	Info: 	engine: end

14:18:02	Info: 	engine: free

The I tried to run the window version (with perfect result) 
and  cut the command line result (from the hts-log)
it looks like this:

C:\Program\WinHTTrack\httrack -qwr2C2%P%sI0%I0c12H0f2#f -
F "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -%
F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x 
[XR&CO'2002], %s -->" -%l "sv, en, *" -%L 
D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2b0.txt -
O "G:\vattenfall\Webmaterial\vfalld2b0","G:\vattenfall\Webma
terial\vfalld2b0" +*.css +*.js -ad.doubleclick.net/* -
*.gif -*.jpg -*.png -*.tif -*.bmp -*.jpeg -*.mov -*.mpg -
*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -
*.qt -*.vid -*.ac3 -*.wma -*.wmv -%A 
php3,php,php2,asp,jsp,pl,cfm=text/html

and is resulting in this log:


HTTrack3.22-3-noV6+swf launched on Wed, 28 May 2003 
14:17:53 at %e0 +*.css +*.js -ad.doubleclick.net/* -*.gif -
*.jpg -*.png -*.tif -*.bmp -*.jpeg -*.mov -*.mpg -*.mpeg -
*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -*.qt -
*.vid -*.ac3 -*.wma -*.wmv

(C:\Program\WinHTTrack\httrack -%L 
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt> -O 
G:\vattenfall\Webmaterial\d2b0\ -Zr2c12o0s2qI0 %e0 -%A 
php3=text/html,php=text/html,php2=text/html,asp=text/html,js
p=text/html,pl=text/html,cfm=text/html -F "Mozilla/4.78 
[en] (Windows NT 5.0; U)" +*.css +*.js -
ad.doubleclick.net/* -*.gif -*.jpg -*.png -*.tif -*.bmp -
*.jpeg -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -
*.rm -*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv )



Information, Warnings and Errors reported for this mirror:

note:	the hts-log.txt file, and hts-cache folder, may 
contain sensitive information,

	such as username/password authentication for 
websites mirrored in this project

	do not share these files/folders if you want these 
information to remain private



14:17:53	Info: 	engine: init

14:17:53	Error: 	Could not include URL list: 
<file://D:\Arkiv\Kundprojekt\Vattenfall\HTTrackURL_d2.txt>

14:17:53	Info: 	engine: start

14:17:53	Debug: 	Wait get: primary/primary

14:17:53	Info: 	engine: check-html: primary/primary

14:17:53	Debug: 	scan file..

14:17:53	Debug: 	link detected in html: <http://%e0>

14:17:53	Debug: 	position link check <http://%e0>

14:17:53	Debug: 	build relative link <http://%e0> with 
primary/primary

14:17:53	Debug: 	wizard link test at %e0/..

14:17:53	Debug: 	wizard test begins: %e0/

14:17:53	Debug: 	Compare addresses: %e0!=primary

14:17:53	Debug: 	result for wizard link test: 0

14:17:53	Info: 	engine: save-name: local name: %
e0/index.html -> %e0/index.html

14:17:53	Debug: 	Record: %e0/ -> 
G:/vattenfall/Webmaterial/d2b0/%e0/index.html

14:17:53	Debug: 	relative link at %e0 build with 
G:/vattenfall/Webmaterial/d2b0/%e0/index.html and 
G:/vattenfall/Webmaterial/d2b0/index.html: %e0/index.html

14:17:53	Debug: 	robots.txt added at %e0

14:17:53	Debug: 	OK, NOTE: %e0/ -> 
G:/vattenfall/Webmaterial/d2b0/%e0/index.html

14:17:53	Debug: 	Wait get: %e0/robots.txt

14:17:58	Error: 	"Unable to get server's address" (-
5) after 2 retries at link %e0/robots.txt (from 
primary/primary)

14:17:58	Debug: 	Wait get: %e0/

14:17:58	Warning: 	Retry after error -5 
(Unable to get server's address) at link %e0/ (from 
primary/primary)

14:17:58	Debug: 	Wait get: %e0/

14:18:00	Warning: 	Retry after error -5 
(Unable to get server's address) at link %e0/ (from 
primary/primary)

14:18:00	Debug: 	Wait get: %e0/

14:18:02	Error: 	"Unable to get server's address" (-
5) after 2 retries at link %e0/ (from primary/primary)

14:18:02	Info: 	No data seems to have been 
transfered during this session! : restoring previous one!

14:18:02	Info: 	engine: end

14:18:02	Info: 	engine: free

The file referred to as ur list looks like this:

<http://www.planetark.org/searchresults.cfm?criteria=emission+trading&sortorder=rel&showweeks=-20>
<http://www.regeringen.se/search97cgi/s97_cgi?Action=Search&ResultCount=20&ResultTemplate=inetstd>-
rk.hts&Querymode=Internet&collection=Finansdepartementet&col
lection=Milj%F6departementet&collection=N%
E4ringsdepartementet&QueryText=%22emission+trading%
22&I3.x=11&I3.y=7
<http://www.regeringen.se/search97cgi/s97_cgi?Action=Search&ResultCount=20&ResultTemplate=inetstd>-
rk.hts&Querymode=Internet&collection=Finansdepartementet&col
lection=Milj%F6departementet&collection=N%
E4ringsdepartementet&QueryText=%22handel+med+utsl%E4ppsr%
E4ttigheter%22&I3.x=13&I3.y=10
<http://www4.stem.se/web/pressmapp.nsf/aktuellapressmedd?openview&count=999>


I desperately need advice on how to manage to download 
those url's from the command line.

Thank you /Lars






 
Reply


All articles

Subject Author Date
WINHHTrack vs command line version

05/28/2003 14:27
Re: WINHHTrack vs command line version

05/28/2003 18:40
Re: WINHHTrack vs command line version

05/29/2003 12:06
WINHHTrack command line version

11/18/2003 07:26
Re: WINHHTrack vs command line version

05/11/2010 10:22




d

Created with FORUM 2.0.11