| Hello. I've searched for this issue, found a few similar threads, but no
clear solution. Have tried a few things myself, which I will try to explain.
Please forgive me, I'm not a techie, and the 12 hours I spent yesterday trying
to do this was my first time using WINHTTrack, so my explanation/terminology
might be innacurate.
I expected to get an offline copy of a full website, which would allow me to
browse/navigate through the entire website, including all of its internal
pages and possibly external links if I had an active internet connection at
the time? Less confident about the external links.
When I tried to run HTTrack, the index.html file would navigate to the
homepage, but any links or other pages I clicked on would just return an
error. The file path for these pages were different to the Base Path that I
had inputted in the GUI. I relaised that the URL for these pages did not have
a trailing slash, and therefore HTTrack seems to treat them as files, not html
pages.
I tried various attmepts using the GUI and also the command line prompts,
where I tried to force all formats using * to equal text/html.
For example, this page (https://waltoninstitute.ie/about/staff?filter=all)
lists all the staff. When you click each staff member you get taken to their
profile. The only way I managed to get a copy of their profile was by extract
a list of all hyperlinks from the page (by inspecting the source code) and
inputted those as the list of URLs for HTTrack, manually adding a trailing
slash to each one. This resulted in an index with separate .html files for
each of those pages. But again, not a single index.html file that would
navigate the whole site.
I have tried so many options that I'm not sure which settings to include with
this post. I set the mirroring depth to different levels, and the external
depth also. This was my last attempt, where I inputted the list of 74 URLS for
each of the staff profile pages. For pasting it here, this has been reduced to
include just 1 site, instead of the 74 I had listed:
HTTrack3.49-2+htsswf+htsjava launched on Sun, 01 Oct 2023 01:14:45 at
<https://waltoninstitute.ie/about/staff/kevin-doolin/> +*.png +*.gif +*.jpg
+*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
(winhttrack -qwr4%e2C2%Ps0u1%s%uN0%I0p3BaK0H0%kf2A25000%f#f -F "Mozilla/4.5
(compatible; MSIE 4.01; Windows 98)" -%F -%l "en, *"
<https://waltoninstitute.ie/about/staff/kevin-doolin/> -O1 "C:\My Web
Sites\Walton Website 09302023" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js
-ad.doubleclick.net/* -mime:application/foobar -%A *=text/html )
What I'm actually trying to do is get an offline copy, as a snapshot record of
this website as it is currently: <https://waltoninstitute.ie/> I'm not
concerned with external websites, but it would be great if the links to
external sites worked when you had an internet connection.
Apologies if that is a load of gibberish! | |