HTTrack Website Copier
Free software offline browser - FORUM
Subject: Httrack grabbing hudge amount of memory - Linux
Author: Krys
Date: 04/14/2013 15:44
 
Hi,

I am running  HTTrack version 3.46 
I had a huge problem with memory on this run and had to kill it because
nothing else could be done on our machine. 

****************************************************
The question is how did this memory get so
inflated?****************************************************

  PID PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5232 20   0 7690m 6.6g  608 S  0.0 21.2   0:10.39 httrack

The log is attached at the end.
It wouldn't have done much as the site now redirects and I don't allow
straying, that I understand, no problem there.

Problem is the memory consumed. I am trying to run several httrack process at
the same time, other processes were consuming around 3g of memory. 5232 was
the winner though! 

Do I need to be worried that I get the string of -iC2 added at the end of the
command line when starting the update?
The aim was to extract as many as possible html and text files, which don't
necessary have any extensions; also xml files as rss feeds can be in them and
they have a lot of links. I wanted to exclude everything else.
As I understand from manual using mime types in options is slower than simple
name filters - is this correct?
Is the -#L1,000,000,000 an issue here? 
I do want to go through a lot of links, some of sites I am interested are
pretty big, -#L1,000,000 was limiting for one of them ("www.bbc.co.uk/news/"
and I want as complete set as possible. I deliberately overshot not to have to
do this again, as if I am correct I cannot change options when running
"-update" and I don't want to have to start from scratch again.
If the link number is the problem, is there a way to run httrack in some other
way that will still cover a very large site and not disable my server?
I am not sure from the manual and the Fred's guide how to use the -cN option
for my advantage here, could it help?
I will be very grateful for your help
Let me know if you need any more information.

Krys



Here is entire log file for 5232:

HTTrack3.46+libhtsjava.so.2 launched on Sat, 13 Apr 2013 06:40:13 at
<http://business.financialpost.com/> -*?print=* -*?page=* -*.mp3 -*.mp4 -*.wav
-*.avi -*.dvi -*.mpg -*.mpeg -*.mov -*.bmp -*.css -*.sxml -*.xlsx -*.xls
-*.doc -*.tar -*.zip -*.swf -*.stm -*.js -*.gif -*.jpg -*.jpeg -*.png -*.pdf
(/usr/local/bin/httrack <http://business.financialpost.com/> -X0 -A100000
-#L1000000000 -z -v -O
/home/krysb/httrack/round_robin/business.financialpost.com -*?print=*
-*?page=* -*.mp3 -*.mp4 -*.wav -*.avi -*.dvi -*.mpg -*.mpeg -*.mov -*.bmp
-*.css -*.sxml -*.xlsx -*.xls -*.doc -*.tar -*.zip -*.swf -*.stm -*.js -*.gif
-*.jpg -*.jpeg -*.png -*.pdf -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2
-iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2
-iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 -iC2 )
Information, Warnings and Errors reported for this mirror:
note:   the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
        such as username/password authentication for websites mirrored in this
project
        do not share these files/folders if you want these information to
remain private

Mirror launched on Sat, 13 Apr 2013 06:40:13 by HTTrack Website
Copier/3.46+libhtsjava.so.2 [XR&CO'2010]
mirroring <http://business.financialpost.com/> -*?print=* -*?page=* -*.mp3
-*.mp4 -*.wav -*.avi -*.dvi -*.mpg -*.mpeg -*.mov -*.bmp -*.css -*.sxml
-*.xlsx -*.xls -*.doc -*.tar -*.zip -*.swf -*.stm -*.js -*.gif -*.jpg -*.jpeg
-*.png -*.pdf with the wizard help..
06:40:13        Info:   engine: init
07:20:54        Debug:  Cache: enabled=2, base=hts-cache/, ro=0
07:20:54        Debug:  Cache: rename hts-cache/new.zip -> hts-cache/old.zip
(0x7f971a5923b4 0x7f971a5b03b4)
07:20:54        Debug:  Cache: successfully renamed
07:20:54        Debug:  Cache: size 1537
07:20:54        Debug:  Cache index loaded: 2 entries loaded
07:20:55        Info:   engine: start
07:20:55        Info:   engine: check-html: primary/primary
07:20:55        Info:   engine: preprocess-html: primary/primary
07:20:55        Info:   engine: save-name: local name:
business.financialpost.com/index.html ->
business.financialpost.com/index.html

Exit requested to engine (signal 15)

End of log file for 5232.
 
Reply


All articles

Subject Author Date
Httrack grabbing hudge amount of memory - Linux

04/14/2013 15:44
Re: Httrack grabbing hudge amount of memory - Linux

04/14/2013 15:54
Re: Httrack grabbing hudge amount of memory - Linux

04/16/2013 13:47
Re: Httrack grabbing hudge amount of memory - Linux

04/16/2013 15:56
Re: Httrack grabbing hudge amount of memory - Linux

04/16/2013 15:57
Re: Httrack grabbing hudge amount of memory - Linux

04/16/2013 16:13
Re: Httrack grabbing hudge amount of memory - Linux

04/16/2013 21:58
Re: Httrack grabbing hudge amount of memory - Linux

04/21/2013 20:23
Re: Httrack grabbing hudge amount of memory - Linux

04/28/2013 09:50




b

Created with FORUM 2.0.11