| > The problem is (this is a guess), that the web-site
> owner actually forbids me to crawl the page that
> fast by returning a 404 error.. I've changed the
> number of connections and the speed, but that just
> removes part of the errors...
reducing the speed can help, but reducing the connections/sec to one is the
biggie (also maybe flow control->persistent)
Also try spider->force http/1.0
It is also possible that the site checks the refering page in the request. So
nothing can be done.
> It would be great If I could specify the number of
> retries on those 404 (fatal not-found) errors...
No such settings.
> not, can you point me to the correct file in the
I've never looked at the sources. | |