HTTrack Website Copier
Free software offline browser - FORUM
Subject: Help w URL redirects
Author: Bill Burnham
Date: 02/14/2007 20:47
 
Hi All,

First let me say congrats to Xavier and the rest of the HTTrack team, you all
have done a great job with the program.

My question revolves around the issue of HTML redirects.  I have a list of
URLs that am downloading where I just want to get the homepage of the site and
that's it.   Theoretically, all I need to do this is to set my crawl depth to
1, but a few of the sites I am crawling redirect from their homepage URL to
another URL which means that instead of getting the home page I typically get
a just a brief snippet of code with a redirect to real homepage.

For example, if you crawl <http://www.microsoft.com> at depth level 1 you will
get a short piece of code that basically redirects your browswer to 
<http://www.microsoft.com/en/us/default.aspx(at> least in my case)


I could solve this problem by just setting the crawl depth to 2, but that
means that I will get a ton of extra pages/links for those sites that actually
return the home page at depth level 1 (such as the New York Times for
example).

I know that I could change the URLs in my crawl list to reflect to correct
"2nd Level" home page URL, but these URL's are prone to change (thus the use
of the redirects) and thus I'd rather have HTTrack automatically resolve the
correct address the way IE or Firefox does.

So my question is: is there a way to keep crawl depth to 1 (so that I only get
a copy of the home page), but in the case of redirects have HTTrack follow any
redirects to the actual homepage.  I suspect that the answer is "no, you have
to put the actual home page URL in your crawl list" but I thought I would ask
anyway :-)

Either way, the program is great and congrats to all who contributed,

Bill
 
Reply


All articles

Subject Author Date
Help w URL redirects

02/14/2007 20:47




0

Created with FORUM 2.0.11