HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Downloading Washington Post content
Author: Xavier Roche
Date: 06/11/2001 21:15
 
> I am interested in downloading the articles in the 
> Washington Post (http://www.washingtonpost.com/wp-
> dyn/print/) each day for a period during which I'll 
be 
> away from home.

You may use
-* +*articles*.html

But you might prefer also:
-* +*.css +*.js 
+*www.washingtonpost.com/*articles*.html

If you want to strictly limit to the washingtonpost 
articles, but keep some necessary scripts and 
stylesheets, too. (+*.png +*.gif +*.jpg might also be 
a good idea)

Note that the current wp site has robots.txt rules 
that will not allow you to crawl all articles unless 
you set proper options in HTTrack:
Set options/Spider/Spider: no robots.txt rules

But in this case select at most 2 simultaneous 
connections, especially if you have a "fast pipe", to 
avoid any server bandwidth overload!

 
Reply Create subthread


All articles

Subject Author Date
Downloading Washington Post content

06/11/2001 17:07
Re: Downloading Washington Post content

06/11/2001 21:15
Re: Downloading Washington Post content

06/11/2001 21:20
Re: Downloading Washington Post content

06/12/2001 11:19




f

Created with FORUM 2.0.11