HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Parsing HTML File (testing links)...ZzzzZzzzzzzzz
Author: Xavier Roche
Date: 04/27/2002 19:03
 
> I've been in the process of mirroring a page for over
> a day.  Granted it has a rather large forum, but 
this 
> seems a bit ridiculous to me as I have 10 connections
> available and its only using one to test the links 
instead of testing and grabbing at the same time.
> Is there any way to speed this up? 

You may experience these performance problems when 
scanning/downloading many cgi-generated 
pages, like "php3" or "asp" links, isn't it?
This problem occurs because the engine has to test 
each script to know the MIME type ("HTML" or "GIF", 
for example), before forming the final destination 
filename (.html or .gif on the local filesystem).

However, in many cases (such as yours), "php3" 
or "asp" are always "text/html" and therefore testing 
these files is just a time loss, as you may have 
noticed, and this slow down the whole process, as only 
one simultaneous connection is generally used.

An option, in Options/MIME Types ("assume" for the 
commandline version), allow to "tell" the engine that 
these cgis always have the same types, such as "HTML" 
("text/html").

Therefore, you may define something like (in 
Options/MIME Types):
php,php3,asp,cgi -> text/html

This should speed up the download process, and let the 
engine use simultaneous connections.

(The commandline syntax is: 
--assume filesystemtype=mimetype/mimesubtype 
[,filesystemtype=mimetype/mimesubtype[,...]] 
example: 
httrack www.foo.com/bar.asp --assume 
php3,asp=text/html)

 
Reply Create subthread


All articles

Subject Author Date
Parsing HTML File (testing links)...ZzzzZzzzzzzzz

04/27/2002 17:33
Re: Parsing HTML File (testing links)...ZzzzZzzzzzzzz

04/27/2002 19:03




6

Created with FORUM 2.0.11