HTTrack Website Copier
Free software offline browser - FORUM
Subject: * * HTTrack 3.34-ALPHA-2 released, testers NEEDED!
Author: Xavier Roche
Date: 05/08/2005 14:48
 
"Testing links" fixes (at last!)
--------------------------------

A major engine upgrade has been issued with the 3.34-a2 release. This release
attempts to solve the infamous "testing links.." slow checks reported since
1998, that causes various troubles:
- multiple requests for files such as <http://www.example.com/foo> (without the
ending /)
- multiple requests for files having an unknown type such as
<http://www.example.com/foo.bar>
- bogus HEAD replies from bogus servers leading to bobus local naming (gif
stored as html, for example)

The reason of these costly checks was a 1998 "design choice" in the parser
code, that required the engine to know the remote URL Mime type before
actually patching links embedded in the HTML structure (.php can be either
named locally as .gif or .html depending on the type). Unknown links (.php,
.asp.. but also URLs ending without any extension) were generating an
additional remote request, slowing down the engine and causing errors with
some servers.

After several hours of extensive design, the workaround found was to redesign
partially the URL scanner, so that it is now able to "wait for an unknown link
header reply" put immediately in background download. This should speedup the
whole download process in many cases, even if the engine still has to wait for
the remote headers to be ready.

Besides, the "backing" system (the background download routines) have been
partially cleaned up to support a large amount of "pending links". This should
speedup again the download process with sites having specific structures (with
many small image files, for example)

These changes required a lots of code rewrite, especially in the very deep
engine routines. Therefore, expect bugs to appear in the -A2 release, such
as:
- bugs when reaching malformed URLs, such as <http://www.example.com/foo>
(redirecting to <http://www.example.com/foo/>) : this minor problem should be
fixed in the -A3 release
- bugs when updating a site (broken links or links with bogus names)

If you experience "clear" broken links (that is, not caused by javascript,
java or flash links), or files ending with ".delayed", please report them,
with a complete, self-contained example that allow the bug to be reproduced
(and any other information you deem necessary!)

Thanks for testing this release, and for your feedback!

Changelog:
3.34-ALPHA-2
+ New: new experimental parser that no longer needs link testing ('testing
link type..')
+ New: improved background download to handle large sites
+ New: '--assume foo/bar.cgi=text/html' is now possible

Download:
<http://www.httrack.com/page/2/en/index.html#beta>

 
Reply


All articles

Subject Author Date
* * HTTrack 3.34-ALPHA-2 released, testers NEEDED!

05/08/2005 14:48




7

Created with FORUM 2.0.11