HTTrack Website Copier
Free software offline browser - FORUM
Subject: Depth of Scan Rules - Include Links
Author: Neil
Date: 01/26/2021 13:28
 
I am attempting to download <https://playground.arduino.cc> (not all of it, but
to keep the site folder structure short I will use this as the example). I
have the default Limits on depth (infinite for the example site and zero for
external sites), but I know there are links within the chosen site to
www.arduino.cc/en/Reference/ which I also wish to grab. Given this, I add the
following scan rules (plus some other but not relevant to discussion):

+*www.arduino.cc/en/Reference/*
-*/Es/*
-*/Bulgarian/*
-*/Catala/*
-*/Deutsch/*
-*/French/*
-*/Italiano/*
-*/Portugues/*
-*/Russian/*
-*/Chinese/*
-*/Francais/*

The majority of these rules are to exclude foreign language copies of all
mirrored links.

So HTTrack arrives at the folder <https://playground.arduino.cc/XYZ> in which is
a link to www.arduino.cc/en/Reference/ABC. According to the scan rules HTTrack
should mirror this new folder. Also, this folder contains links to other pages
such as www.arduino.cc/en/Reference/GHI.

My question is: Once HTTrack get to the page www.arduino.cc/en/Reference/ABC,
what mirroring depth applies? (my answer: potentially infinite)

Shortening the site to the final folder, would HTTrack mirror ABC than go on
to mirror GHI? (my answer: Yes)

What if GHI contained more links to other folders at
www.arduino.cc/en/Reference, would these also be mirrored (we are now at depth
3)? (my answer: yes)

Finally, if ABC (or GHI) contained a link back to www.arduino.cc/en/Reference,
would all of www.arduino.cc/en/Reference (and its associated folders and
sub-folders) be mirrored? This would be equivalent to including
www.arduino.cc/en/Reference in the list of Web Addresses (on page 2 of
WinHTTrack). (my answer: yes)


Thanks.
 
Reply


All articles

Subject Author Date
Depth of Scan Rules - Include Links

01/26/2021 13:28




b

Created with FORUM 2.0.11