HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Can't copy complete website from entry page
Author: Charles
Date: 04/24/2012 01:14
 
Some kind of link obfuscation is being used in the menu selection on the pages.
For me Httrack was not able to copy the menu links, so when starting at the
default.aspx link, Httrack was only able to mirror the default.apsx page and
the home.aspx page and then it stopped mirroring.

Maybe someone knows a setting in Httrack to copy the menu links correctly. I
don't.

To get over this menu link obfuscation I did a test save of the website this
way.

Put all of these links in the Web Addresses (URL) box. This way you are
copying all of the links that are in the menu selection on the pages but doing
it without going through the menu:

<http://johnjayplatko.com/default.aspx>
<http://johnjayplatko.com/home.aspx>
<http://johnjayplatko.com/Instruments.aspx>
<http://johnjayplatko.com/shop.aspx>
<http://johnjayplatko.com/aboutus.aspx>
<http://johnjayplatko.com/Students.aspx>
<http://johnjayplatko.com/buildingasteelguitar1.aspx>
<http://johnjayplatko.com/acoustics.aspx>
<http://johnjayplatko.com/FEA.aspx>
<http://johnjayplatko.com/MusicInstruction.aspx>
<http://johnjayplatko.com/tools.aspx>
<http://johnjayplatko.com/links.aspx>
<http://johnjayplatko.com/ForumReviews.aspx>
<http://johnjayplatko.com/books.aspx>
<http://johnjayplatko.com/contactus.aspx>


In Set Options > Scan Rules add the rule - *johnjayplatko.com/*


In Set Options go to - Browser ID - and in - Browser "Identity" - select the
first entry on the list which is MSIE 6.0.


In Set Options go to the - Links - tab and check the - Get non-HTML files
related to a link - box.


In Set Options > Spider, in the - Spider: - selection, select - no robot txt
rules.


After the site was mirrored I used a text replace editor to fix the links in
the menu on the web pages so that the links pointed to the locally saved web
pages.

Below is a link to the text replace editor I used. I have used the below
editor for years, it has a steep learning curve.

<http://www.ecobyte.com/replacetext/>


Also Httrack was not able to save the wav files that are on some of the pages,
so these had to be downloaded manually and the links pointing to them had to
be changed to point to the locally saved audio files.

I will say that the hardest bit for me was altering the audio file links. I
personally had not encountered audio links like that and it took me some time
to figure out the correct way to get the links working.

Below is a section of one of the audio file links:

Original unchanged link:

SRC\x5cx3d\x5cx22http\x5cx3a\x5cx2f\x5cx2fjohnjayplatko.com\x5cx2fdocuments\x5cx2floudk.wav\


The above link section changed so that it points to the locally saved wav
file:

SRC="Documents\x5cx2floudk.wav\

 
Reply Create subthread


All articles

Subject Author Date
Can't copy complete website from entry page

04/22/2012 20:12
Re: Can't copy complete website from entry page

04/22/2012 21:04
Re: Can't copy complete website from entry page

04/22/2012 21:58
Re: Can't copy complete website from entry page

04/23/2012 14:22
Re: Can't copy complete website from entry page

04/23/2012 15:26
Re: Can't copy complete website from entry page

04/24/2012 01:14
Re: Can't copy complete website from entry page

04/24/2012 01:23
Re: Can't copy complete website from entry page

01/27/2014 02:50
Re: Can't copy complete website from entry page

01/27/2014 02:51
Re: Can't copy complete website from entry page

04/24/2012 16:17
Re: Can't copy complete website from entry page

04/24/2012 21:20
Re: Can't copy complete website from entry page

04/24/2012 23:10
Re: Can't copy complete website from entry page

04/24/2012 23:57




1

Created with FORUM 2.0.11