|  | > If you add +*.htm +*.html to the filters be sure you 
have  
> mirror level depth limits set otherwise you're going to 
> download the whole internet.
Exactly - use instead:
-* +www.foo.com/*.htm +www.foo.com/*.html 
+www.bar.com/*.htm +www.bar.com/*.html
.. and so on (for each site you want to crawl, add 
+<site>/*.htm +<site>/*.html)
 |  |