| Hi,
Some people (in this forum or via email) have
experienced performance problems when using huge URL
lists, or when scanning big HTML files, or big
websites with many links (oftent unknown types like
asp pages)
If you are one of them, please read the text below,
thanks!
The upcoming version of HTTrack (3.03) has been
*greatly* optimized, especially in scan routines. A
new option has been included, too, which allow to set
MIME types for "unknown" filetypes, too.
It would be great if several people could do some
tests using the beta-3.03 release of HTTrack, to test
speed improvements brought by these optimizations, but
also to ensure that the very-deep changes done in
critical macros and functions in the engine will not
cause other problems
Indeed, many optimizations have been done, and this is
a potential thread to the overall engine stability!
If you are interested:
---------------------
- contact me ASAP (roche@httrack.com)
- please test the beta 3.03 at
<http://www.httrack.com/beta.zip>
(replace the existing .exe in WinHTTrack program files
folder)
AND do not forget to send me any feedback and remarks,
feelings, bug report, or any other problem which may
have occured during yous tests!
My preliminary tests for 3.03beta:
---------------------------------
(tested on a PIII@800/256MB)
1. Including 100,000 links using "URL list" parameter:
version 3.02 : = 11 minutes and 50 seconds
version 3.03 : < less than 1 second
2. Scanning a 15MB HTML file with 10,000 "html links":
version 3.02 : = 31 minutes and 10 seconds
version 3.03 : = 27 seconds
3. Besides, many people have experienced performance
problems when scanning/downloading many cgi-generated
pages, like "php3" or "asp" links.
This problem occurs because the engine has to test
each script to know the MIME type, before forming the
final destination filename.
However, in many cases, "php3" or "asp" are
always "text/html" and therefore testing these files
is just a time loss
A new option, called "assume", will allow to "tell"
the engine that these cgi's always have the same
types.
The syntax is:
--assume filesystemtype=mimetype/mimesubtype
[,filesystemtype=mimetype/mimesubtype[,...]]
Example:
httrack www.foo.com/bar.asp --assume
php3=text/html,asp=text/html,sgif=image/gif,sjpg=image/
jpeg
This feature will speed up many mirrors, for sure! :
3. Scanning a 15MB HTML file with 10,000 "PHP3 links":
version 3.02 : = > few hours (interrupted..)
version 3.03 with --assume php3=text/html: = 19
seconds for the scan
| |