HTTrack Website Copier
Free software offline browser - FORUM
Subject: circular 302 redirection
Author: Jeff Shaffer
Date: 06/19/2004 04:03
 
Hi Xavier,

When I run the following httrack command:

httrack -%W receive-header=callback.so:process -%W
check-html=callback.so:process_file
<http://falltotown.com/eqfma.php?p0bfa838e7786c2cbceicdfh.10020130bfa838e7786c2c08oberlicaol.com0bfa838e7786c2cchrisaol>
-%e3 -r3 -B -T -e -F "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1)" -E600 -Z

I am getting a number of timeouts like this:

More than 600 seconds passed.. giving up

I have done a little digging around and used the callbacks
to examine the headers and html during the execution (in
addition to the -Z option).

It appears that httrack gets into a circular loop when it
encounters 302 headers that have the same URI, but with
http:// vs. https://. Here's a description of the scenario I
see:

1) One of the links when discovered by httrack contains the
following 302 Location forwarding:

Location:
<http://about.mailblocks.com/?src=cj&AID=10284656&PID=1233254>

2) The above URI when later downloaded by httrack contains
the following link:
 <a href="/register.aspx">Sign Up Now</a>

3) When <http://about.mailblocks.com/regiser.aspx> is
downlaoded by httrack, the following is in the hearder:

HTTP/1.1 302 Found
Server: Microsoft-IIS/5.0
Date: Fri, 18 Jun 2004 21:52:28 GMT
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Location: <https://www.mailblocks.com/register.aspx?Cache-Control>: private
Content-Type: text/html; charset=utf-8
Content-Length: 158

Note that this 302 header has a Location forward to
<https://www.mailblocks.com/register.aspx>? <<<<----
essentially the same URI, but with https:// instead of http://

I haven't had time to look at the code, but I suspect that
when you store the URIs internally, you are normalizing them
 and loosing the distinction between http:// and https://,
which causes an endless loop. I could be way of on this, but
from looking at the externals, it appears that this is what
is causing httrack to time out in this case.

If you know of an easy fix, that would be great. Otherwise,
if you can point me to the code area I should start with, I
will attempt to find a fix myself.

HTTrack works great - just having issues with 302 headers...

Regards,
Jeff
 
Reply


All articles

Subject Author Date
circular 302 redirection

Jeff Shaffer

06/19/2004 04:03
Re: circular 302 redirection

Xavier Roche

06/20/2004 17:36




6

Created with FORUM 2.0.11