| > I'm doing regular downloads of www.jp.dk, and I notice a
> disturbing effect: When a page has moved, the 302 headers
> may get downloaded many times without the page itself ever
> appearing.
This is a big issue that I wanted to fix one day - I
finished to code it, but it requires some testing :)
The problem is simple: files such as /foo require an
additional test, and tests are not immediately taken in
account. In a "redirect loop", intermediate states are not
saved, leading to new requests if the link is seen again.
I added a cache for all these states, and test requests
should not be done twice anymore - please give you feedback
about this new release.
> 4722 <http://www2.jp.dk/info> 302
This shouldn't happend anymore - I hope.
> Shouldn't it be recorded somehow that the redirects have
> been followed?
Yep :)
(See beta-2 currently available)
> P.S. I find it amusing that the first line says
essentially
> 'HTTrack launched at <site>'. Sounds like a cruise
missile
> or something:)
Well, as long as it doesn't crash, everything is fine :)
| |