HTTrack Website Copier
Free software offline browser - FORUM
Subject: Redirecs getting redownloaded many times
Author: Lars Clausen
Date: 04/07/2004 10:40
I'm doing regular downloads of, and I notice a
disturbing effect:  When a page has moved, the 302 headers
may get downloaded many times without the page itself ever
appearing.  Here's the tail end of a count of how many times
specific URLs were downloaded, together with the response code:

   4 <> 301
   5 <> 301
  48 <> 301
  48 <> 301
  78 <> 301
4679 <> 302
4722 <> 302
4722 <> 302 redirects to, which in turn
redirects to, which is never downloaded or
even mentioned in the log.  However, it seems that HTTrack
doesn't figure out this dead end and tries to download every time it encounters it. redirects to, for which the log only says
engine: save-name: local name: ->

These entries make up about half of the downloaded pages. 
Shouldn't it be recorded somehow that the redirects have
been followed?
Crawler setup:
HTTrack3.31-noV6-nossl launched on Wed, 07 Apr 2004 04:30:00
(httrack -%W receive-header=httrack-arc:get_header -%W
transfer-status=httrack-arc:dump_chunk -F "HTTrack 3.30.102
(non-archiving test version, see" -B -c10 -i -C2 -n -z
-a -A100000 -#L10000000 )


P.S. I find it amusing that the first line says essentially
'HTTrack launched at <site>'.  Sounds like a cruise missile
or something:)

All articles

Subject Author Date
Redirecs getting redownloaded many times

04/07/2004 10:40
Re: Redirecs getting redownloaded many times

04/07/2004 21:35


Created with FORUM 2.0.11