HTTrack Website Copier
Free software offline browser - FORUM
Subject: Wayback archive solution?
Author: Ectoplasm
Date: 11/15/2004 14:25
 
From previous posts on this subject there seems to be no
solution, I tried them but didn't work. Now I've found out why:

The Wayback Machine website archiver (www.archive.org) uses
a special script. Pages on the archive site contain links to
the original domain (for example <http://www.abc.com>). The
script that runs on Wayback pages converts these links into
something like
<http://web.archive.org/web/yyyymmdd/http://abc.com>, so that
the request for deeper pages still goes to Wayback Machine.

Because HTTrack does not (seem to) run the script (but
browsers do), it cannot find any of the deeper pages because
the first page unfortunately refers to the original domain
(www.abc.com) which does not exist anymore.

I see two solutions: 1) HTTrack must run the script, or 2)
HTTrack should directly replace the absolute links it finds
to the original domain to include the Wayback domain (string
replace for encountered links).

Is HTTrack capable of either one of these or is there maybe
a different solution?
 
Reply


All articles

Subject Author Date
Wayback archive solution?

11/15/2004 14:25




8

Created with FORUM 2.0.11