HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Handling soft error pages
Author: Xavier Roche
Date: 03/26/2010 13:03
 
> Is there a way to handle soft 404 error pages in
> httrack? 
> I am crawling a website which does not return 404
> for missing pages. Instead it returns a 200 OK with
> a page which says -- file not found. 

No - when the error page is served through a redirect, you may manually
replace it after the crawl, but there is no simple way to tell these pages are
actually 404 errors, because httrack processes by default synchronously HTTP
headers, and asynchronously HTTP body.

It means that 404 errors, when detected in headers, produce an external link
in the resulting page immediately, but this is not possible when the
information is inside the body.
 
Reply Create subthread


All articles

Subject Author Date
Handling soft error pages

03/26/2010 12:02
Re: Handling soft error pages

03/26/2010 13:03




3

Created with FORUM 2.0.11