HTTrack Website Copier
Free software offline browser - FORUM
Subject: Very slow performance
Author: Oleg
Date: 07/28/2017 09:08
 
Hello!

My crawling performance is not more than 2-3 rps, but i expect it to be many
times higher.

Its not the problem of the site - it works fast, responses fast, not
IP/cookie/... limit or anything else.

Its not the problem of running machine - there are a lot of cpu/memory/network
resources.

Its not the problem of filers.

How can i find the bottleneck?What should i grep logs for?
There are nearly 80k log messages for 24hour period like "Waiting for type to
be known: ... .html" - is it ok? 
Why crawler needs to know that .html is text/html?Can it be the bottleneck?
Thank you.

httrack <http://site/> \
-O "/data/site" \
-r50 \
-A1000000 \
-%c50 \
-c10 \
-T30 \
-R5 \
-K4 \
-n \
-N "%h%p/%n%q_%M.%t" \
-s0 \
-F "Mozilla/5.0" \
-%F "" \
-%l “ru” \
-q \
-z \
-Z \
-v \
--debug-headers \
--disable-security-limits \
"some filters here"
 
Reply


All articles

Subject Author Date
Very slow performance 07/28/2017 09:08
Re: Very slow performance 07/28/2017 09:46
Re: Very slow performance 07/28/2017 18:30
Re: Very slow performance 07/28/2017 20:21
Re: Very slow performance 08/01/2017 19:04




6

Created with FORUM 2.0.11