| > If you add +*.htm +*.html to the filters be sure you
have
> mirror level depth limits set otherwise you're going to
> download the whole internet.
Exactly - use instead:
-* +www.foo.com/*.htm +www.foo.com/*.html
+www.bar.com/*.htm +www.bar.com/*.html
.. and so on (for each site you want to crawl, add
+<site>/*.htm +<site>/*.html)
| |