|
Hi,
I've been using httrack for a while and it's quite powerful. but it's not
enough performant to fit my needs ...
indeed, it can take up to 100ms (more sometimes) to parse a big html page or a
javascript file. I'm using callbacks to prevent httrack downloading to much
data. I only need to retrieve objects needed to display the page (frames,
iframes,pictures, css, scripts and flash). Thus, my callback function tells
httrack not to download the link in most of the cases (this can explain
parsing is quite long but I have some doubts anyway ... I traced the time
wasted in XH_uninit function and it's quite huge compared to real parsing time
...)
is the new version of httrack faster in html and javascript parsing? what can
be the expected gain?
If you think I do not use httrack in the good way (rejecting links with
callback) and if parameters can do this as well and easier tell me please!
I have another question but it's quite complex: is there anyway I can only use
httrack as a html parser using httrack library?
The way I do today is to give hts_main a file to parse with url
<file:///path/myfile>. (this file is retrieved via http by another soft).
I only use high level API (hts_init(), htswrap_add()and hts_main());
Can I use a memory buffer instead and then avoid file creation which takes
some time?Can I plug myself deeper, closed to parsing engine as I do not need
http engine?
excuse me if it's not clear ... do not hesitate to ask me if you need more
details ...
| |