| > Is there a way to handle soft 404 error pages in
> httrack?
> I am crawling a website which does not return 404
> for missing pages. Instead it returns a 200 OK with
> a page which says -- file not found.
No - when the error page is served through a redirect, you may manually
replace it after the crawl, but there is no simple way to tell these pages are
actually 404 errors, because httrack processes by default synchronously HTTP
headers, and asynchronously HTTP body.
It means that 404 errors, when detected in headers, produce an external link
in the resulting page immediately, but this is not possible when the
information is inside the body.
| |