| Hi!
HTTRACK is really really great, but I often experience long
delays while it is scanning a single page.
I have found the faq-article cited below and I hope for
some background information.
Is this linkchecking done in parallel? Are there
possibilities that this part will be faster?
Does HTTRACK really access many pages twice?Or does it read a php-page
completely whenever it
finds a link to one in the current page and stores the
read content in its cache? So that what appears to
be idle time is in reality fetching lots of pages?
Nevertheless, the way httrack works is different
to Teleport Pro (which will calculate the links later),
but has many advantages
Norbert
Sometimes, links are malformed in pages. "a href="/foo""
instead of "a href="/foo/"", for example, is a common
mistake. It will force the engine to make a supplemental
request, and find the real /foo/ location.
Dynamic pages. Links with names terminated by .php3, .asp
or other type which are different from the regular .html
or .htm will require a supplemental request, too. HTTrack
has to "know" the type (called "MIME type") of a file
before forming the destination filename. Files like foo.gif
are "known" to be images, ".html" are obviously HTML pages -
but ".php3" pages may be either dynamically generated html
pages, images, data files...
If you KNOW that ALL ".php3" and ".asp" pages are in fact
HTML pages on a mirror, use the assume option:
--assume php3=text/html,asp=text/html
This option can be used to change the type of a file, too :
the MIME type "application/x-MYTYPE" will always have
the "MYTYPE" type. Therefore,
--assume dat=application/x-zip
will force the engine to rename all dat files into zip
files
| |