HTTrack Website Copier
Free software offline browser - FORUM
Subject: Downloading Washington Post content
Author: Stu Borman
Date: 06/11/2001 17:07
 
Dear HTTrack Forum members:

I am interested in downloading the articles in the 
Washington Post (http://www.washingtonpost.com/wp-
dyn/print/) each day for a period during which I'll be 
away from home. I wanted to set up the dowloads in 
advance and then ask a family member to do them and e-
mail them to me while I'm away. I'm in the habit of 
reading the Post each day, and I didn't want to have 
to catch up with scads of issues when I return.

Each daily version of the Post has several sections, 
and each section has a home page. On Mondays, for 
example, there are basically four sections I'm 
interested in -- the main (A) section, Metro, Style, 
and Business -- and a home page, more or less, for 
each of these sections.

Can I list each of the four home pages in WinHTTrack 
and create a scan rule that would first tell the 
program not to download any linked pages (-*.*) except 
HTML pages with the word "articles" in the URL 
(+*articles*.html). (All the Washington Post articles 
have URLs of the form 
<http://www.washingtonpost.com/wp-dyn/articles/A49148>-
2001Jun10.html>.)

The Washington Post section home pages contain links 
to a lot of other extraneous things -- like classified 
ads, banner ads, other section pages I'm not 
interested in (or are appropriate only for other days 
of the week), subscription information, etc. -- and I 
would need to eliminate all those items from the 
download to make the duration and size of the download 
reasonable.

So my idea is to ask HTTrack to download nothing 
whatsoever, except for HTML URLs containing the 
term "articles".

Would this work? I'm not sure if it would because it's 
not given as an example in the help files or user 
manual. Also, would these scan rules prevent HTTrack 
from downloading the section home pages themselves, on 
which the article links are found (because of the <-
*.*> command)?
I've tried to use several offline browsers over the 
years to do this exact job, and although I'm an 
extremely experienced computer user, I have never been 
successful in setting up one of the offline browsers 
to do what I wanted. I seem to be incapable of setting 
up any of these programs properly. That's why I'm 
asking for assistance on this.

Thanks for any advice you can provide.

Regards, Stu Borman
 
Reply


All articles

Subject Author Date
Downloading Washington Post content

06/11/2001 17:07
Re: Downloading Washington Post content

06/11/2001 21:15
Re: Downloading Washington Post content

06/11/2001 21:20
Re: Downloading Washington Post content

06/12/2001 11:19




f

Created with FORUM 2.0.11