| > Xavier, I need to ask about redirects again. If I
have an A HREF link in my 'index.htm' page which
points to 'redirect.asp', and redirect.asp has
> Response.Redirect <http://another.site.com/>
> Why is a page called 'redirect.html' created (along
with index.htm), containing a Meta refresh to the new
URL? I am sure that HTTrack used to simply rewrite
the link in index.htm from 'redirect.asp'
to <http://another.site.com/>
> It is not a big problem, but it creates unnecessary
files.
This is a limit of the HTTrack engine - a consequence
of the stack system. To summarize, there are two ways
of spidering on websites:
1. First method
while links.available do the following:
while links.in.html.page do the following:
add link if necessary on the links stack
write location on final html file
done
done
2. Second method
while links.available do the following:
while links.in.html.page do the following:
get link if necessary and wait for this link
write location on final html file
done
done
While the first method is much more efficient, and
requires less memory/cpu usage, the second is
interesting in case of redirects: you can still write
the final location on method #1, not of method #2,
because the redirect message is sent when you are
trying to get the page.
I assume that you used 'assume' option :) however
Without this option, the engine would have checked
the .asp filetype, and when encountering the redirect,
directly rewrite the correct URL (should be the
correct behaviour if there isn't any bug)
| |