HTTrack Website Copier
Free software offline browser - FORUM
Subject: MSDN not copying - seems like URL Case-Sensitivity
Author: Brett Peirce
Date: 06/19/2015 17:09
 
I am trying to rip a portion of the documentation on MSDN, starting from some
focused pages and limiting the pages up in the logical site hierarchy.
Microsoft ideals being what they are, the site seems organized to not care
about the case of most characters in links; thus:

<https://msdn.microsoft.com/en-us/library/xxxxxx.aspx>
   *is equivalent to*
<https://msdn.microsoft.com/EN-US/library/xxxxxx.aspx>.

It seems like all the links to 'EN-US' resources are being ignored (not
ripped), while all the links to 'en-us' resources are being saved/rewritten,
making me think HTTRACK is somehow ignoring them due to the case (?)

What I imagine happening is it basically creating a local 'en-us' directory to
copy content, writing linked files to it (both linked via 'en-us' and
'EN-US'), then checking the URLs against the local directory string while
rewriting. When the URL contains 'en-us', it sees the 'en-us' directory and
that checks out. When it gets to a URL with 'EN-US', I'm guessing it doesn't
see a local resource that matches that capitalization, and leaves the URL
alone(?) - that's all I can imagine :-(

Release notes seem to mention an intent to handle URLS with different cases in
release 3.33 (I'm using 3.48-21) - could it have gotten broken? is this not
what it was supposed to handle?
Has anyone encountered this kind of thing? is there a way to work around it?
 
Reply


All articles

Subject Author Date
MSDN not copying - seems like URL Case-Sensitivity

06/19/2015 17:09
Re: MSDN not copying - seems like URL Case-Sensitivity

06/19/2015 17:33




6

Created with FORUM 2.0.11