| > "c:\Utilities\WinHTTrack\httrack.exe" -O1
> K:\Results\Crawl-0911-00949
> www.Some10DomainsMissingHere.com -F"Mozilla 11.0,
> Linux" -H3 -%A asp=text/html -s0 -g -r2 -%c5 -c4
> -ic1 -%I -d -R2 -z -Z +*.html +*.htm +*.php +*.php3
> +*.php4 +*.php5 -*.aif -*.avi -*.bmp -*.css -*.exe
> -*.aif -*.avi -*.bmp -*.css -*.exe -*.fla -*.flv
> -*.gif -*.ico -*.iff -*.jpeg -*.jpg -*.js -*.m3u
> -*.m4a -*.m4v -*.mid -*.mov -*.mp3 -*.mp4 -*.mpa
> -*.mpeg -*.mpg -*.ogg -*.ogv -*.pcx -*.png -*.ppt
> -*.ps -*.pdf -*.swf -*.tif -*.tiff -*.webm -*.wma
> -*.wmv -.zip --index
<http://www.httrack.com/html/fcguide.html>
1) Drop the filters and get html only use -P1 (what if the html page has NO
extention?) or use --get
2) If you only want the one page use --get-files. Why are you getting the
second level html files (-r2)
3) Drop the assume, that's only for broken sites.
4) you could also put the urls in a text file and use --list textFileUrl
| |