new parser performance ? - HTTrack Website Copier Forum

Subject: new parser performance ?

Author: GuiOm

Date: 01/10/2006 15:45

 
Hi,
I've been using httrack for a while and it's quite powerful. but it's not
enough performant to fit my needs ...
indeed, it can take up to 100ms (more sometimes) to parse a big html page or a
javascript file. I'm using callbacks to prevent httrack downloading to much
data. I only need to retrieve objects needed to display the page (frames,
iframes,pictures, css, scripts and flash). Thus, my callback function tells
httrack not to download the link in most of the cases (this can explain
parsing is quite long but I have some doubts anyway ... I traced the time
wasted in XH_uninit function and it's quite huge compared to real parsing time
...)

is the new version of httrack faster in html and javascript parsing? what can
be the expected gain?
If you think I do not use httrack in the good way (rejecting links with
callback) and if parameters can do this as well and easier tell me please!


I have another question but it's quite complex: is there anyway I can only use
httrack  as a html parser using httrack library? 
The way I do today is to give hts_main a file to parse with url
<file:///path/myfile>. (this file is retrieved via http by another soft). 
I only use high level API (hts_init(), htswrap_add()and hts_main());
Can I use a memory buffer instead and then avoid file creation which takes
some time?Can I plug myself deeper, closed to parsing engine as I do not need
http engine?
excuse me if it's not clear ... do not hesitate to ask me if you need more
details ...

All articles

Subject	Author	Date
new parser performance ?		01/10/2006 15:45
Re: new parser performance ?		01/11/2006 19:43
Re: new parser performance ?		01/12/2006 10:28