HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: DL Entire Forum Thread? (Plus, Robots.txt Issues)
Author: Leto
Date: 04/03/2007 01:11
 
> -> I am looking to download a very large message
> board thread (830 pages!), as opposed to the entire
> website, and have hit a snag.

You have to be very careful with downloading "sections" of forums--due to
their nature you will generally, by default, mirror the entire forum.

I would advise you to:

1. Set the start URL as the first page of the thread you want

2. Analyse the URL structure of the forum to produce a list of rules that will
allow HTTrack to only download exactly what you need

3. Define your filters (scan rules), for example:

-* +*.jpg +*.gif +*.png +*.css +*.js
+example.com/forum/attachment.php*
+example.com/forum/view.php?*threadID=12345*

The above rules would disallow everything except URLs that match those
specific files/rules.
 
Reply Create subthread


All articles

Subject Author Date
DL Entire Forum Thread? (Plus, Robots.txt Issues)

03/29/2007 03:27
Re: DL Entire Forum Thread? (Plus, Robots.txt Issu

03/29/2007 23:56
Can Anyone Respond?...Re: DL Entire Forum Thread?

03/31/2007 02:19
Re: DL Entire Forum Thread? (Plus, Robots.txt Issues)

04/03/2007 01:11
Re: DL Entire Forum Thread? (Plus, Robots.txt Issu

04/06/2007 07:39
Re: DL Entire Forum Thread? (Plus, Robots.txt Issu

04/06/2007 14:19
Can Anyone Respond?...Re: DL Entire Forum Thread?

11/05/2009 11:21




e

Created with FORUM 2.0.11