Re: new parser performance ? - HTTrack Website Copier Forum

Subject: Re: new parser performance ?

Author: Xavier Roche

Date: 01/11/2006 19:43

> indeed, it can take up to 100ms (more sometimes) to
> parse a big html page or a javascript file.

How "big" ?
The parsing can be slowed down by background downloads, as httrack will wait
for the connection to be established before rewriting the URLs.

> scripts and flash). Thus, my callback function tells
> httrack not to download the link in most of the
> cases (this can explain parsing is quite long but I
> have some doubts anyway ... I traced the time wasted
> in XH_uninit function and it's quite huge compared
> to real parsing time ...)

XH_uninit should be called only once per project, and frees all blocks and
related memory segments ; so this isn't surprising that it takes some time.

> is the new version of httrack faster in html and
> javascript parsing? what can be the expected gain?
Err, no, the parser is really similar.

But I suspect that the time spent is actually not CPU time, but rather I/O or
"sleep" time.

> If you think I do not use httrack in the good way
> (rejecting links with callback) and if parameters
> can do this as well and easier tell me please!

No, seems fine.

> Can I use a memory buffer instead and then avoid
> file creation which takes some time?
Err, you can try to always use the same (empty) file, and use the
"postprocess-html" callback, to fill the data.
(See <http://www.httrack.com/html/plug.html>)

> Can I plug myself deeper, closed to parsing engine
> as I do not need http engine?
Use the "exclude all" filter (-*) to prevent httrack from taking anything, and
possibily triggering any download. This should speed up the process.

Create subthread

All articles

Subject	Author	Date
new parser performance ?		01/10/2006 15:45
Re: new parser performance ?		01/11/2006 19:43
Re: new parser performance ?		01/12/2006 10:28