| Hello,
I goet exactly the same problem as the thread below :
<http://forum.httrack.com/readmsg/15848/index.html>
My question revolves around the issue of HTML redirects. I have a list of
URLs that am downloading where I just want to get the homepage of the site
and
that's it. Theoretically, all I need to do this is to set my crawl depth to
1, but a few of the sites I am crawling redirect from their homepage URL to
another URL which means that instead of getting the home page I typically get
a just a brief snippet of code with a redirect to real homepage.
For example, if you crawl <http://www.microsoft.com> at depth level 1 you
will
get a short piece of code that basically redirects your browswer to
<http://www.microsoft.com/en/us/default.aspx(at> least in my case)
I could solve this problem by just setting the crawl depth to 2, but that
means that I will get a ton of extra pages/links for those sites that
actually
return the home page at depth level 1 (such as the New York Times for
example).
I know that I could change the URLs in my crawl list to reflect to correct
"2nd Level" home page URL, but these URL's are prone to change (thus the use
of the redirects) and thus I'd rather have HTTrack automatically resolve the
correct address the way IE or Firefox does.
So my question is: is there a way to keep crawl depth to 1 (so that I only
get
a copy of the home page), but in the case of redirects have HTTrack follow
any
redirects to the actual homepage.
Thank you ! | |