HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Fuzzy Logic
Author: Tj
Date: 07/19/2002 19:11
 
========== Cut-n-Paste
URIs are case sentitive in httrack (not hostnames) ; 
therefore index.html and INDEX.HTML will be saved as two 
different files by httrack. Besides, httrack considers that 
index.html and INDEX.HTML are the same local ressource 
names, and therefore will save them with different names.
========== End Cut-n-Paste

...this makes sense, but that is assuming that a link
is pointing to both of them, then as HTTrack processed
it, it would find both (assuming they existed).


========== Cut-n-Paste
you should not have 404 errors, as httrack isn't 
converting names into lowercase when doing the requests
========== End Cut-n-Paste

...exactly, this is what I was referring to.  If a link 
says www.website.com/index.htm, but in fact on the host the 
file name is INDEX.HTM, the browser (as well as HTTrack) 
cant find it (at least for the site that originally 
prompted my first post).  This is not a bug/problem with 
HTTrack, but this does cause the download to fail.  The 
problem is that the author of the page created a bad link 
(or assumed the filename would be lowercase, not upper).

So what I was proposing was that if HTTrack gets a 404, it 
tries converting the name to all uppercase, then lowercase, 
then titlecase, as well as tries .htm, & .html (this should 
probably be optional to do all these different combonations 
as I am pretty sure it could slow things down).  This could 
very useful, as you could still get a page even though the 
links are bad, then when it is saved, they would probably 
be corrected by HTTrack.

I wish I had the website that I found this problem on... 
but I got so frustrated trying to go through and fix the 
links manually I think I deleted the link (I know I deleted 
the local copy of the stuff I was working on).  I will 
continue to search for it, and post it if I find it again.

Regards,
Tj
 
Reply Create subthread


All articles

Subject Author Date
Fuzzy Logic

07/18/2002 23:46
Re: Fuzzy Logic

07/19/2002 11:55
Re: Fuzzy Logic

07/19/2002 19:11
Re: Fuzzy Logic

07/19/2002 19:15




3

Created with FORUM 2.0.11