| I know the difficulty in interpreting the complete set of
javascript as this will become a compiler problem. but how
about this? (I am just making a suggestion but i know it
sounds a bit crazy.. =) )
if i have an external helper program, the parser of
httrack will constantly ask the external helper program,
if a file being parsed match with a rule in the external
program, (like filename match) the external program will
then find a method to regenerate the url in a way that the
external program already know what to do to generate the
url in select list or post form.
It's something like this:
say if i know the site www.sport.com/soccer/index.php has
a select box in an object named "sport_art" and the link
to these article is generated by combining a string
like <http://www.sport.com/soccer/art.php?art_id=">; with
the value selected in the select box. like this
<select name="sport_art">
<options value="match1.php">Match 1</options>
<options value="match2.php">Match 2</options>
</select>
then httrack will parse every url it parse to the external
helper program first. and the external helper program has
a rule says, if filename match with
www.sport.com/soccer/index.php, the external helper will
pack up a script written for that particular page. The
helper then use that script to generate a list of url that
can not be parsed in httrack. the new generated list will
then feed back to httrack for offline browsing again. This
method will have a flexibility for changes as if you want
to capture the link in the select list, u just need to add
a rule in the external helper, add a script for
translating the javascript in that page, then the
retranslated link will appear as a link in a temporary
html file. After that, you just need to schedule httrack
to scan the temporary page again for links that is known
to be missing in the first scan and get a more complete
mirror of a site.
Like in the above example, the script for that page can be
quite simple, parse the html and find the section of the
object sport_art, then grap all the value in options tag,
then combine that with a string i already know, then
output that to a simple html file in the temporary folder.
Many ppl can do this using any languages they like..
I know this is only a rare case as many ppl will just add
a few more new projects for links that is known to be
missing. however as the internet is becoming more dynamic,
more and more web site will have changing file names. its
sometime not feasible for me for example to add a few more
links in the offline broswer before i go to work in the
morning. (the article id is not known before hand..) | |