HTTrack Website Copier
Free software offline browser - FORUM
Subject: Redirected link is broken
Author: Alain Desilets
Date: 02/09/2012 21:10
I am trying to crawl the first 2 levels of a site with the following command:

httrack <> -O
C:\wbtwrite\prealigner_data\site_mirrors\ -v -r 2 --update -I0
-s2 +*lang* +*.js -*.jpg -*.jpeg -*.gif -*.mov -*.mp3 -*.zip -*.wav -*.mpg
-*.mpeg -*.tiff

You can try it yourself... it takes max 5 mins to complete.

It mostly works, except for the interlanguage links. For example, if you load
the home-accueil/text-eng.html file from the local mirror, you will see a link
Français (link to the French version) in the upper left corner. Clicking on
it takes you to:


i.e., it takes you outside of the mirror, and onto the original server. As the
name of the suggests, it is a script that automatically
redirects to the French page for the English page that it was referred from
(or the other way around if the referrer was a French page).

The funny thing is that the French page for home-accueil/text-eng.html is
indeed on the local mirror (it's called home-accueil/text-fra.html). So
obviously, Httrack was able to "follow" the Français link, and save its
redirected content to disk. It's just that it didn't change the actual link in
the mirrored home-accueil/text-eng.html to  point to the mirrored French file
instead of the original on the original server.

I'm puzzled by this, because I tried to reproduce this problem by creating a 3
page web site locally on my computer. This site uses this kind of approach. But when I crawl it, the interlanguage links in the
mirror are fine.

Note that on the server, the robots.txt prohibits access to the
cgi-bin directory where resides. But I am using the -s2 option,
so it shouldn't matter should it? And for good measure, I put a +*lang* option
to force treatment of any file whose name contains lang.

Any idea what might be the matter?

Alain Désilets

All articles

Subject Author Date
Redirected link is broken

02/09/2012 21:10
Re: Redirected link is broken

02/14/2012 13:42


Created with FORUM 2.0.11