HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Downloading a 'page' only
Author: Xavier Roche
Date: 11/28/2003 19:47
 
> Here's a thing the Heritrix does that HTTrack doesn't seem
> to do:  It will attempt to download all the parts that 
make
> up a page, considering them all part of the same object. 
> All frame contents, pictures, embedded objects etc that 
are
> required to redisplay the page

Humm, this might be an idea for a future release ; 
somethink like a more powerful "near" option

> Would it be hard to do?  Can
> plugins get the info required to pick the right pages 
(i.e.
> context of links)?
This would require some coding - as the current "wizard" 
(which decides which link has to be downloaded) does not 
even know the upstream tag name which generated the link.

This would cause another problem: how to handle a href's 
and img src's identical links?a href's won't be downloaded (not "embedded"
file), but img 
src's will - so what to do in this case:

<a href="foo.gif">
<img src="foo.gif">

The first link will be rewritten as absolute link ; not the 
second. This might cause annoying side effects?

 
Reply Create subthread


All articles

Subject Author Date
Downloading a 'page' only

11/26/2003 10:44
Re: Downloading a 'page' only

11/28/2003 19:47
Re: Downloading a 'page' only

12/01/2003 14:26




c

Created with FORUM 2.0.11