HTTrack Website Copier
Free software offline browser - FORUM
Subject: Combining 'site' and 'all' by programming?
Author: Tapio
Date: 03/06/2004 14:43
 
In HTTrack by choosing 'Download all SITES in PAGES
(multiple mirror)' you can mirror a page 
(www.site.com/dir/page.html) and all sites, internal or 
external, that are directly near. Now, would it be 
difficult to program HTTrack to 'Download all PAGES in 
SITE' in order to get the whole site and all pages, 
internal or external, that are directly near, respectively?

   Also to quickly comment your previous answer (much
   appreciated) on a similar question:

>That is, mirror a site and mirror all 
>external links (this can potentially be a HUGE mirror!)?>This isn't possible
unless you use filters (scan rules).

   Yes, sometimes it CAN be, but for instance in vast
   amount of scientific sites it isn't huge, BUT instead
   there is just a huge amount of links down and to small
   sites (listing of which in scan rules is too complex and
   time consuming).

>This is a bit complex: for example by mirroring the site
>with external depth=1, and then list external references
>in 
>hts-cache/new.txt (excluding the current site, and sorting
>external host addresses + path using commandline
>scripting), and the use "continue an interrupted mirror"
>removing the external depth BUT including the list of
>sites in the list..

   Yes, that is too complex, but many thanks for looking in
   the subject.
 
Reply


All articles

Subject Author Date
Combining 'site' and 'all' by programming?

03/06/2004 14:43




0

Created with FORUM 2.0.11