| Hi Xavier,
When I run the following httrack command:
httrack -%W receive-header=callback.so:process -%W
check-html=callback.so:process_file
<http://falltotown.com/eqfma.php?p0bfa838e7786c2cbceicdfh.10020130bfa838e7786c2c08oberlicaol.com0bfa838e7786c2cchrisaol>
-%e3 -r3 -B -T -e -F "Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1)" -E600 -Z
I am getting a number of timeouts like this:
More than 600 seconds passed.. giving up
I have done a little digging around and used the callbacks
to examine the headers and html during the execution (in
addition to the -Z option).
It appears that httrack gets into a circular loop when it
encounters 302 headers that have the same URI, but with
http:// vs. https://. Here's a description of the scenario I
see:
1) One of the links when discovered by httrack contains the
following 302 Location forwarding:
Location:
<http://about.mailblocks.com/?src=cj&AID=10284656&PID=1233254>
2) The above URI when later downloaded by httrack contains
the following link:
<a href="/register.aspx">Sign Up Now</a>
3) When <http://about.mailblocks.com/regiser.aspx> is
downlaoded by httrack, the following is in the hearder:
HTTP/1.1 302 Found
Server: Microsoft-IIS/5.0
Date: Fri, 18 Jun 2004 21:52:28 GMT
X-Powered-By: ASP.NET
X-AspNet-Version: 1.1.4322
Location: <https://www.mailblocks.com/register.aspx?Cache-Control>: private
Content-Type: text/html; charset=utf-8
Content-Length: 158
Note that this 302 header has a Location forward to
<https://www.mailblocks.com/register.aspx>? <<<<----
essentially the same URI, but with https:// instead of http://
I haven't had time to look at the code, but I suspect that
when you store the URIs internally, you are normalizing them
and loosing the distinction between http:// and https://,
which causes an endless loop. I could be way of on this, but
from looking at the externals, it appears that this is what
is causing httrack to time out in this case.
If you know of an easy fix, that would be great. Otherwise,
if you can point me to the code area I should start with, I
will attempt to find a fix myself.
HTTrack works great - just having issues with 302 headers...
Regards,
Jeff | |