| I am trying to download a very large website with many external links. The
original site I'm mirroring is www.jesus-is-savior.com. I can download the
default internal pages and any "near" external images and non-html using the
options in the menu choices. I have tried repeatedly to mirror only the
specific "absolute" external urls that link off of each of the internal sites
pages.
I am instead getting many files that I don't want from each of the external
sites that the url's link to.A random example would be the url
<http://www.infowars.com/articles/ps/china_organs_regime_admits_to_organ_harvesting_prisoners.htm>
which links from a page on the internal site
<http://www.jesus-is-savior.com/Disturbing%20Truths/organ_harvesting.htm> will
catch information,files,images audio, video, etc.. from the various external
site depth pages that it travels to get to that url.
I am using the default settings and options for HTTrack, and have tried
incrementally advancing the external depth from 1, 2, 3 ,etc.. with no
success.
What I am requesting is some assistance or suggestions on what filters I need
so that I can mirror all files in the internal site, but only the specific url
pages (and all other non-html/image files on those exact pages) for each of
the external links.
I appreciate your help with this question.
David Custer | |