| Hi Xavier,
Thanks for this greate program. I have been testing Teleport and HTTrack for a
while, also wrote my own downloaders, found that HTTrack is among the best...
However, some issues still not solved in HTTrack (or any other crawlers I
know), I don't know how google does this!
The problem is, how can the crawler follow <form method=post> and some
javascript tricks? For example: on a web site (such as a forum or bbs), it
lists lots of items, and there is a link called "next page" on it. This link
does NOT use things like "list.php?page=2" etc, instead, it calls a javascript
to set a hidden var, then post the form to direct to the next page.
How can HTTrack solve this? As far as I know, it is very hard. I am thinking
of using IE's DOM capability to control it to download such pages (refer to
<http://wtr.rubyforge.org/>). What do you think?
Thanks,
Shannon | |