| > I am interested in downloading the articles in the
> Washington Post (http://www.washingtonpost.com/wp-
> dyn/print/) each day for a period during which I'll
be
> away from home.
You may use
-* +*articles*.html
But you might prefer also:
-* +*.css +*.js
+*www.washingtonpost.com/*articles*.html
If you want to strictly limit to the washingtonpost
articles, but keep some necessary scripts and
stylesheets, too. (+*.png +*.gif +*.jpg might also be
a good idea)
Note that the current wp site has robots.txt rules
that will not allow you to crawl all articles unless
you set proper options in HTTrack:
Set options/Spider/Spider: no robots.txt rules
But in this case select at most 2 simultaneous
connections, especially if you have a "fast pipe", to
avoid any server bandwidth overload!
| |