HTTrack Website Copier
Free software offline browser - FORUM
Subject: Performance Issues
Author: Benjamin Fox
Date: 09/28/2006 10:38
 
Hi Guys,

We are about to decommission our external Lotus Notes servers as we moving to
a different web rendering platform (IBM Http Server). However we still will
utilise our internal Lotus Notes server to generate content.

We have an obvious need to spider our website as fast as we can. I have found
the appropriate commands to use to get over the standard limitations however
the content generated still takes approx 60 minutes to run.

If someone could have a look at the command line I'm using and just check that
it's optimised to run as fast as possible....

Here's the status at the end of hts-log.txt when the spidering completed

HTTrack Website Copier/3.40-s mirror complete in 53 minutes 46 seconds : 2872
links scanned, 3289 files written (179412621 bytes overall), no files updated
[577326 bytes received at 178 bytes/sec]
(19 errors, 0 warnings, 0 messages)
The errors I'm aware of ( just internal links that point to non-existence
files (404) )

Here's the command line that was utilised <note the host/websites have been
changed from a security perspective>

"C:\Program Files\WinHTTrack\httrack.exe" <http://HOST/internet/notesDB.nsf> -O
"C:\Spider\WebsiteTest,C:\Spider\Logs" --depth=9999 -i%%P -N
%%[ContentType::::pub]/%%n.%%t --fast-engine --max-rate=1000000
--connection-per-second=0 --sockets=4 --mirror --verbose --quiet --footer=""
--cache=2 --disable-security-limits --near --updatehack --urlhack --robots=0
--check-type=1 -%%A nsf=text/html -%%I0p7DaK0H0%%kf2%%f#f -%%l "en, en, *"
+HOST/internet/notesDB.nsf"

Here's the log from the doit.log

<http://HOST/internet/notesDB.nsf> -O "C:\\Spider\\WebsiteTest,C:\\Spider\\Logs"
-r9999 -i%P -N %[ContentType::::pub]/%n.%t -#X -A1000000 -%c0 -c4 -w -v -q -%F
"" -C2 -%! -n -%s -%u -s0 -u1 -%A nsf=text/html -%I0p7DaK0H0%kf2%f#f -%l "en,
en, *" +HOST/internet/notesDB.nsf

Obviously there are many contributing factors, the web site's CPU
availability, memory. Mis-configuration in the command line. Out of interest,
a lot of our generated content are pdf's I think 50% of the overall mb's
transferred are pdf's.

Quick question... Is it a recommended to install httrack in this particular
scenario to run on the actual Domino server itself? That way we'd just be
hitting the localhost. Good idea / bad idea?
Qnother quick question, i'm using the --miror option, in future we'll always
be doing updates, is it better to utilise --update or leave the default
configuration to --mirror ???
Xavier, must say that this is an absolute gem of a product, you've done
yourself very proud. For us it suits our needs perfectly... Would love to
tweak it a bit! :-) If i'm ever in france i'll buy you a few beers or a few
bottles of some red!!  Maybe i'll bring across some nice Aussie red's! :-)

Thanks all!
Bennie.
 
Reply


All articles

Subject Author Date
Performance Issues

09/28/2006 10:38
Re: Performance Issues

10/10/2006 20:21




6

Created with FORUM 2.0.11