| I have the following parameters to run the command line version of HTTrack
httrack --skeleton --disable-security-limits <http://www.website.com> -P
proxy1.ext.com:8080 -O ./tmp “—* mime:text/* -mime:text/css +*.php
+*.asp” -v -%c20 --max-rate 200000000 -N %h%p%N
I *only* want to get the information from a website to be parsed through
later, no media, pdf's, css, etc. This seems to do that but it is still
horribly slow. I have tried it on and off the proxy with similar results. It
might be because there are so many small files instead of one large one it can
just download at a constant rate, but shouldn't the "-%c20" take care of
that?
The custom naming bit at the end is me trying to have a flat file structure
where each file is named its web address, (i.e.
<http://www.website.com/dir/crawled_html_page.html>), that doesn't seem to work
right either but its a secondary issue at the moment.
Thanks for any help! | |