Re: About javascript? - HTTrack Website Copier Forum

Subject: Re: About javascript?
Author: Louis
Date: 02/25/2003 08:03
I know the difficulty in interpreting the complete set of 
javascript as this will become a compiler problem. but how 
about this? (I am just making a suggestion but i know it 
sounds a bit crazy.. =) )

if i have an external helper program, the parser of 
httrack will constantly ask the external helper program, 
if a file being parsed match with a rule in the external 
program, (like filename match) the external program will 
then find a method to regenerate the url in a way that the 
external program already know what to do to generate the 
url in select list or post form.

It's something like this:
say if i know the site www.sport.com/soccer/index.php has 
a select box in an object named "sport_art" and the link 
to these article is generated by combining a string 
like <http://www.sport.com/soccer/art.php?art_id=">; with 
the value selected in the select box. like this

<select name="sport_art">
<options value="match1.php">Match 1</options>
<options value="match2.php">Match 2</options>
</select>

then httrack will parse every url it parse to the external 
helper program first. and the external helper program has 
a rule says, if filename match with 
www.sport.com/soccer/index.php, the external helper will 
pack up a script written for that particular page. The 
helper then use that script to generate a list of url that 
can not be parsed in httrack. the new generated list will 
then feed back to httrack for offline browsing again. This 
method will have a flexibility for changes as if you want 
to capture the link in the select list, u just need to add 
a rule in the external helper, add a script for 
translating the javascript in that page, then the 
retranslated link will appear as a link in a temporary 
html file. After that, you just need to schedule httrack 
to scan the temporary page again for links that is known 
to be missing in the first scan and get a more complete 
mirror of a site.

Like in the above example, the script for that page can be 
quite simple, parse the html and find the section of the 
object sport_art, then grap all the value in options tag, 
then combine that with a string i already know, then 
output that to a simple html file in the temporary folder. 
Many ppl can do this using any languages they like..

I know this is only a rare case as many ppl will just add 
a few more new projects for links that is known to be 
missing. however as the internet is becoming more dynamic, 
more and more web site will have changing file names. its 
sometime not feasible for me for example to add a few more 
links in the offline broswer before i go to work in the 
morning. (the article id is not known before hand..)
Create subthread
All articles
Subject	Author	Date
About javascript?		02/25/2003 03:50
Re: About javascript?		02/25/2003 07:00
Re: About javascript?		02/25/2003 08:03