HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Merging mirrored websites
Author: Mikec
Date: 10/05/2006 18:10
 
This is following up an older thread.

I am using Winhttrack for research.  Intially I was annoyed by the fact that
it seemed fairly difficult to retrieve all the content referred to by a
website (particularly in the case of links to documents residing on other
sites) without getting a plethora "junk" links. (If someone has a tutorial on
how to do this, it would be appreciated.)

However, I discovered that by allowing HTTRack to "roam" across the web, I
could often uncover "nuggets" of good information that otherwise would have
escaped me.

I'm moving more toward the idea of a single "project" containing all of the
mirrors for each topic.  However, I've experienced the following
shortcomings:

1. There is a limit to either the number of URL's or the byte count used by
them.

2. Even when using the "*Continue interrupted download" approach to "adding
on" to a project, HTTrack appears to download files which it already has,
adding significantly to the transfer time.

3. HTTrack does not respect "read only" attributes.  It would be excellent to
be able to "prevent" HTTrack from downloaded any more for "www.xyz.org" by
marking that directory in the project as read-only.

4. Finally, it would help "usability" if HTTRack could break the site list
into "groups".  One project I've built has over 5,500 directories at the top
level of the project directory.  This causes substantial delays and memory
issues.

5. HTTrack appears to have a memory based "route map" which it uses to
navigate by.  This data structure can easily exceed available physical and
virtual memory.  A segmented approach to the data structure for large
downloads would help performance and stability significantly.

I'm not really a novice with the tool, but I'm not well educated on it either. 
Any help, suggestions or alternative ways of accomplishing this would be
appreciated.

Mike C
 
Reply Create subthread


All articles

Subject Author Date
Merging mirrored websites

01/01/2005 18:24
Re: Merging mirrored websites

01/08/2005 11:11
Re: Merging mirrored websites

02/19/2005 08:13
Re: Merging mirrored websites

10/05/2006 18:10
Re: Merging mirrored websites

09/20/2008 13:04




3

Created with FORUM 2.0.11