| > When I downloading my website (it's an dynamic website
with
> a ASP/DB, very huge (around 200.000 files, 7 GB), spread
> over different servers) I'm running into one big problem.
> The graphics directory seams to be downloaded in a loop
for
> ever and ever.
>
G:\lsm_offline_130803b\LSM\lsbu.marketing.agilent.com\graphi
cs\graphics\graphics\graphics\graphics\graphics\graphics\gra
p
> Any idea what I do wrong or how I can solve the problem?
I can't reach lsbu.marketing.agilent.com ; but I suppose
that you have the following case:
- a bad link somewhere (like /foo) generates a "false" 404
page (a regular 200 page with an error message inside)
- inside this broken page, another link such as
img src="graphics/"
- another "broken" 404 page for /foo/graphics
- another link to graphics/ which will cause httrack to
follow foo/graphics/graphics
.. and so on
To fix that permanently:
Ensure your server correctly respond to errors ; that is:
- either with a "404" error page (this is the only good
solution - see RFC2616)
- redirects to a central "404" error page (using a 302 http
error code with a redirection)
But sending a regular (200 error code) page in case of
error is really dangerous.
As a temporary fix:
Add the following scan rule (set options / scan rules)
-*/graphics/graphics/*
| |