Re: looking for a tip to download more than one site

Subject: Re: looking for a tip to download more than one site

Author: William Roeder

Date: 07/11/2010 17:34

> Loop for all the urls:
> httack -I -w -D -r5 -M1000000 -F <useragent>
> -A100000 -G10000000 -m50000 -O myweb/<url> <url>

> The result is that some pages are well downloaded
> while others not.
While every site is potentially different, Some settings usually work. 

The -r5 does not mirror one page it mirrors many. If you want one page, then
the supporting files are level two -r2

Add the near flag -n so all supporting files, where ever they are are
captured.

Add the -%P so links in javascript are found.

You might want to override robots.txt on some sites -s0

-I -M -A are unnecessary, the -G probably. 

-m only 50k files? lots of html files are bloated with inline javascript now
making files bigger than 100k
07/02/2010  01:17 PM           121,545 index-11.html
07/02/2010  01:17 PM           114,532 index-14.html
07/02/2010  04:52 PM           114,433 index-17.html
07/02/2010  08:27 PM           110,231 index14.html

I always run with -x so you know where the mirror ends.
I never use an httrack browserID

<http://www.httrack.com/html/fcguide.html>

Create subthread

All articles

Subject	Author	Date
looking for a tip to download more than one site		07/10/2010 18:57
Re: looking for a tip to download more than one site		07/11/2010 17:34
Re: looking for a tip to download more than one site		07/12/2010 11:25