HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Redirects [whoops]
Author: Xavier Roche
Date: 09/24/2001 16:20
 
> Xavier, I need to ask about redirects again. If I 
have an A HREF link in my 'index.htm' page which 
points to 'redirect.asp', and redirect.asp has
> Response.Redirect <http://another.site.com/>
> Why is a page called 'redirect.html' created (along 
with index.htm), containing a Meta refresh to the new 
URL?  I am sure that HTTrack used to simply rewrite 
the link in index.htm from 'redirect.asp' 
to <http://another.site.com/>
> It is not a big problem, but it creates unnecessary 
files.

This is a limit of the HTTrack engine - a consequence 
of the stack system. To summarize, there are two ways 
of spidering on websites:

1. First method

while links.available do the following:
while links.in.html.page do the following:
add link if necessary on the links stack
write location on final html file
done
done

2. Second method

while links.available do the following:
while links.in.html.page do the following:
get link if necessary and wait for this link
write location on final html file
done
done

While the first method is much more efficient, and 
requires less memory/cpu usage, the second is 
interesting in case of redirects: you can still write 
the final location on method #1, not of method #2, 
because the redirect message is sent when you are 
trying to get the page.

I assume that you used 'assume' option :) however
Without this option, the engine would have checked 
the .asp filetype, and when encountering the redirect, 
directly rewrite the correct URL (should be the 
correct behaviour if there isn't any bug)

 
Reply Create subthread


All articles

Subject Author Date
Redirects [whoops]

09/24/2001 07:55
Re: Redirects [whoops]

09/24/2001 16:20
Re: Redirects [whoops]

09/25/2001 01:10
Re: Redirects [whoops]

09/25/2001 18:33




1

Created with FORUM 2.0.11