| Hi,
I'm new to HTtrack and am trying to mirror corporate sites. My goal is to
mirror all text from a website, and also download text-related files such as
.doc, .pdf, .ppt - no images, movies, etc.
After trying a few sites, I have found that the downloads are often incomplete
and I'm having particular problems with flash-based sites, which seem to be
creating endless loops. I'm wondering:
1) are the scan rules I'm using correct, or is there a better way to
accomplish my goal? (written below), and
2) how can I avoid the empty looping problem?
+www.example.com/*.html
+www.example.com/*.php
+www.example.com/*.asp
-*.gif -*.jpg -*.png -*.tif -*.bmp -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3
+*.txt +*.doc +*.docx +*.xls +*.xlsx +*.ppt +*.pdf
Any input would be greatly, greatly appreciated! Thanks so much in advance
...
| |