| > I want to spider a site using IBM WebSphere and I can't
> succeed for the moment.
> The problem with such a site is that there are no standard
> link between the pages: all the links are javascript
links.
> Moreover, the homepage uses frames and the URL of the
frames
> are complex and computed by a javascript script.
Yuk. I will never understand why people use products that
were so badly designed. Using javascript to produce links
inside a website environment is a really stupid way of
doing things IMHO - especially when standard and simple
technologies such as plain DHTML can be used.
This is bad, because no crawler will even succeed to crawl
this site:
- offline browsers will never be able to copy the site
- search engines will never index the site (a pretty uncool
feature, uh?)
- and of course disabled/blind people will never get the
chance to read it because most braille systems just can not
cope with javascript
> If HTTrack cannot spider such a site, do you know of any
> other application?
No. I don't think that this is possible, except for very
simple cases (httrack can already handle quite simple
cases, I mean _really_ simple ones). Analyzing javascript
sites to rebuild their structure is a REALLY hard thing to
do - I mean every harder than interpreting javascript.
| |