| > indeed, it can take up to 100ms (more sometimes) to
> parse a big html page or a javascript file.
How "big" ?
The parsing can be slowed down by background downloads, as httrack will wait
for the connection to be established before rewriting the URLs.
> scripts and flash). Thus, my callback function tells
> httrack not to download the link in most of the
> cases (this can explain parsing is quite long but I
> have some doubts anyway ... I traced the time wasted
> in XH_uninit function and it's quite huge compared
> to real parsing time ...)
XH_uninit should be called only once per project, and frees all blocks and
related memory segments ; so this isn't surprising that it takes some time.
> is the new version of httrack faster in html and
> javascript parsing? what can be the expected gain?
Err, no, the parser is really similar.
But I suspect that the time spent is actually not CPU time, but rather I/O or
"sleep" time.
> If you think I do not use httrack in the good way
> (rejecting links with callback) and if parameters
> can do this as well and easier tell me please!
No, seems fine.
> Can I use a memory buffer instead and then avoid
> file creation which takes some time?
Err, you can try to always use the same (empty) file, and use the
"postprocess-html" callback, to fill the data.
(See <http://www.httrack.com/html/plug.html>)
> Can I plug myself deeper, closed to parsing engine
> as I do not need http engine?
Use the "exclude all" filter (-*) to prevent httrack from taking anything, and
possibily triggering any download. This should speed up the process.
| |