HTTrack Website Copier
Free software offline browser - FORUM
Subject: URLs with trailing slash being treated as files
Author: Bonzo
Date: 10/01/2023 12:10
 
Hello.   I've searched for this issue, found a few similar threads, but no
clear solution.  Have tried a few things myself, which I will try to explain. 
Please forgive me, I'm not a techie, and the 12 hours I spent yesterday trying
to do this was my first time using WINHTTrack, so my explanation/terminology
might be innacurate.

I expected to get an offline copy of a full website, which would allow me to
browse/navigate through the entire website, including all of its internal
pages and possibly external links if I had an active internet connection at
the time?  Less confident about the external links.

When I tried to run HTTrack, the index.html file would navigate to the
homepage, but any links or other pages I clicked on would just return an
error.  The file path for these pages were different to the Base Path that I
had inputted in the GUI. I relaised that the URL for these pages did not have
a trailing slash, and therefore HTTrack seems to treat them as files, not html
pages.

I tried various attmepts using the GUI and also the command line prompts,
where I tried to force all formats using * to equal text/html.

For example, this page (https://waltoninstitute.ie/about/staff?filter=all)
lists all the staff. When you click each staff member you get taken to their
profile.  The only way I managed to get a copy of their profile was by extract
a list of all hyperlinks from the page (by inspecting the source code) and
inputted those as the list of URLs for HTTrack, manually adding a trailing
slash to each one. This resulted in an index with separate .html files for
each of those pages.  But again, not a single index.html file that would
navigate the whole site.

I have tried so many options that I'm not sure which settings to include with
this post. I set the mirroring depth to different levels, and the external
depth also. This was my last attempt, where I inputted the list of 74 URLS for
each of the staff profile pages. For pasting it here, this has been reduced to
include just 1 site, instead of the 74 I had listed:

HTTrack3.49-2+htsswf+htsjava launched on Sun, 01 Oct 2023 01:14:45 at
<https://waltoninstitute.ie/about/staff/kevin-doolin/> +*.png +*.gif +*.jpg
+*.jpeg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar

(winhttrack -qwr4%e2C2%Ps0u1%s%uN0%I0p3BaK0H0%kf2A25000%f#f -F "Mozilla/4.5
(compatible; MSIE 4.01; Windows 98)" -%F  -%l "en, *"
<https://waltoninstitute.ie/about/staff/kevin-doolin/> -O1 "C:\My Web
Sites\Walton Website 09302023" +*.png +*.gif +*.jpg +*.jpeg +*.css +*.js
-ad.doubleclick.net/* -mime:application/foobar -%A *=text/html )


What I'm actually trying to do is get an offline copy, as a snapshot record of
this website as it is currently: <https://waltoninstitute.ie/>    I'm not
concerned with external websites, but it would be great if the links to
external sites worked when you had an internet connection.

Apologies if that is a load of gibberish! 
 
Reply


All articles

Subject Author Date
URLs with trailing slash being treated as files

10/01/2023 12:10
Re: URLs with trailing slash being treated as files

10/01/2023 12:19




6

Created with FORUM 2.0.11