| I am attempting to download <https://playground.arduino.cc> (not all of it, but
to keep the site folder structure short I will use this as the example). I
have the default Limits on depth (infinite for the example site and zero for
external sites), but I know there are links within the chosen site to
www.arduino.cc/en/Reference/ which I also wish to grab. Given this, I add the
following scan rules (plus some other but not relevant to discussion):
+*www.arduino.cc/en/Reference/*
-*/Es/*
-*/Bulgarian/*
-*/Catala/*
-*/Deutsch/*
-*/French/*
-*/Italiano/*
-*/Portugues/*
-*/Russian/*
-*/Chinese/*
-*/Francais/*
The majority of these rules are to exclude foreign language copies of all
mirrored links.
So HTTrack arrives at the folder <https://playground.arduino.cc/XYZ> in which is
a link to www.arduino.cc/en/Reference/ABC. According to the scan rules HTTrack
should mirror this new folder. Also, this folder contains links to other pages
such as www.arduino.cc/en/Reference/GHI.
My question is: Once HTTrack get to the page www.arduino.cc/en/Reference/ABC,
what mirroring depth applies? (my answer: potentially infinite)
Shortening the site to the final folder, would HTTrack mirror ABC than go on
to mirror GHI? (my answer: Yes)
What if GHI contained more links to other folders at
www.arduino.cc/en/Reference, would these also be mirrored (we are now at depth
3)? (my answer: yes)
Finally, if ABC (or GHI) contained a link back to www.arduino.cc/en/Reference,
would all of www.arduino.cc/en/Reference (and its associated folders and
sub-folders) be mirrored? This would be equivalent to including
www.arduino.cc/en/Reference in the list of Web Addresses (on page 2 of
WinHTTrack). (my answer: yes)
Thanks.
| |