Re: Downloading Washington Post content - HTTrack Website Copier Forum

Subject: Re: Downloading Washington Post content

Author: Xavier Roche

Date: 06/11/2001 21:15

> I am interested in downloading the articles in the 
> Washington Post (http://www.washingtonpost.com/wp-
> dyn/print/) each day for a period during which I'll 
be 
> away from home.

You may use
-* +*articles*.html

But you might prefer also:
-* +*.css +*.js 
+*www.washingtonpost.com/*articles*.html

If you want to strictly limit to the washingtonpost 
articles, but keep some necessary scripts and 
stylesheets, too. (+*.png +*.gif +*.jpg might also be 
a good idea)

Note that the current wp site has robots.txt rules 
that will not allow you to crawl all articles unless 
you set proper options in HTTrack:
Set options/Spider/Spider: no robots.txt rules

But in this case select at most 2 simultaneous 
connections, especially if you have a "fast pipe", to 
avoid any server bandwidth overload!

Create subthread

All articles

Subject	Author	Date
Downloading Washington Post content		06/11/2001 17:07
Re: Downloading Washington Post content		06/11/2001 21:15
Re: Downloading Washington Post content		06/11/2001 21:20
Re: Downloading Washington Post content		06/12/2001 11:19