HTTrack Website Copier
Free software offline browser - FORUM
Subject: Crawler still slow (~3-15kbps)
Author: ITracker
Date: 01/21/2014 17:28
 
I have the following parameters to run the command line version of HTTrack

httrack --skeleton --disable-security-limits <http://www.website.com> -P
proxy1.ext.com:8080 -O ./tmp “—* mime:text/* -mime:text/css +*.php
+*.asp” -v -%c20 --max-rate 200000000 -N %h%p%N

I *only* want to get the information from a website to be parsed through
later, no media, pdf's, css, etc. This seems to do that but it is still
horribly slow.  I have tried it on and off the proxy with similar results.  It
might be because there are so many small files instead of one large one it can
just download at a constant rate, but shouldn't the "-%c20" take care of
that?
The custom naming bit at the end is me trying to have a flat file structure
where each file is named its web address, (i.e.
<http://www.website.com/dir/crawled_html_page.html>), that doesn't seem to work
right either but its a secondary issue at the moment.

Thanks for any help!
 
Reply


All articles

Subject Author Date
Crawler still slow (~3-15kbps)

01/21/2014 17:28




0

Created with FORUM 2.0.11