HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: infinite loop at downloading a webpage
Author: Jan Janssen
Date: 08/15/2003 12:14
 
Hi Xavier,

Thanks for your quick answer.

> I can't reach lsbu.marketing.agilent.com ; 

I forgot to mention to this is an intranet site...

> - a bad link somewhere (like /foo) generates a 'false' 
404 
> page (a regular 200 page with an error message inside)

Hmm, I change within the IIS the custom Errors property of 
HTTP Error 404 from type 'file' to 'url' and send it to one 
of my ASP pages '/error/eshow.asp?error_id=16&'. This is a 
page that has the layout (including graphics) of the site 
and it's working without any problem. I'm not sure whether 
this creates a 200 page or still keeps the 404 

> - inside this broken page, another link such as
> img src='graphics/'

Inside which broken page? The eshow.asp looks ok to me. 

> .. and so on

Hm, I can't follow this behaviour on my page.

 
> To fix that permanently:
> 
> Ensure your server correctly respond to errors ; that is:
> - either with a '404' error page (this is the only good 
> solution - see RFC2616)

But what does the 404 page should include -only text 
without links (couldn't find anything concerning this at 
<http://www.faqs.org/rfcs/rfc2616.html>)?
> - redirects to a central '404' error page (using a 302 
http 
> error code with a redirection)

any idea how I can do this with IIS?
> But sending a regular (200 error code) page in case of 
> error is really dangerous.

Thanks for the advice. if I'm doing this athe moment I will 
try to fix it.

> As a temporary fix:
> 
> Add the following scan rule (set options / scan rules)
> -*/graphics/graphics/*

Arg, why couldn't I think about somethink like that :-( 
Anyhow, due to all this downloaded graphics directories, 
don;t you think that httrack downloads also the 
corresponding html documents that includes the link to the 
graphic? wouldn;t this one not shown broken images? After 
your idea with the 404 page I looked into the 
downloaded /error/ folder and guess what I found: 4112 
files which all showed the content of my '/error/eshow.asp?error_id=16&' file.
Seems that he is downloading the 404 
page again and again.. Andy idea why httrack is doing that? 
Do you think it would help, if I would create a plain 
eshow.HTM page instead of the eshow.asp with an parameter? 

How is Httrack handeling dynamic pages? If I refer twice on 
a page liks this: eshow.asp?errorid=16 does he download the 
page twice or is it smart enough to see that this is the 
same page?? e.g. the first time the pages is downloaded as: 
eshow6c8d.html. does her refer by the second link to this 
file or does he download the page again under a new name, 
e.g. eshow6d6c.html? Do I have a change to control this 
(preferences etc)?

best regards

Jan

 
Reply Create subthread


All articles

Subject Author Date
infinite loop at downloading a webpage

08/15/2003 11:12
Re: infinite loop at downloading a webpage

08/15/2003 11:39
Re: infinite loop at downloading a webpage

08/15/2003 12:14
Re: infinite loop at downloading a webpage

08/15/2003 13:04
Re: infinite loop at downloading a webpage

08/15/2003 13:09
Re: infinite loop at downloading a webpage

08/15/2003 13:30
Re: infinite loop at downloading a webpage

08/15/2003 13:45
Re: infinite loop at downloading a webpage

08/18/2003 02:07




6

Created with FORUM 2.0.11