| > I used the following statement to crawl the webpage.
> I need only html and images used in the initial
> page. But the below statement crawl, all the links
> used in the web page. Please advice me on this
>
> httrack en.wikipedia.org/wiki/Ringtone -O
> \\home\\test\\httrack-3.43.1\\data\\websec-1.9.0\\1\\8
> -q -Q -N 20081217100027\\%n.%t -o0 -X0 -T30
> -R1 -I0 -%F "" -F "Mozilla/5.0 Firefox/3.0.3" -%h
> -* +*.jpg +*.jpeg +*.css +*.js +*.gif +*.bmp +*.tif*
> +*.png +*.swf -*.exe -*.pdf -*.doc -*.zip
Per httrack.com/html/fcguide.html -%h doesn't take an argument.
If your running under windows you probably need quotes and not apostrophes and
if your using a bat file all percents need to be doubled.
Your output directory should not have the hts-cache, as that is created by
httrack.
Finally the default is to mirror -w or --mirror. You want to just get the page
-g or --get-files or set maximum depth -r1 or --depth=1
| |