Re: Downloading .mov from site (ASP and JavaScript

Subject: Re: Downloading .mov from site (ASP and JavaScript

Author: Abel Deuring

Date: 09/17/2004 15:37

> Had a quick look.  I would not think the javascript is 
> preventing HTTrack from finding the URLs to the movies, but 
> I wonder if it's the HTML coding and a parsing problem...

Well, I wouldn't claim to be very familiar with the 
internals of httrack's HTML and Javascript parsers, but
I think this site is a good example how easy it is to
make "interpreting" Javascript really complicated for a 
mirror program like httrack. The page defines a JS 
function launchit, which is mainly a wrapper for 
window.open(). Hence, httrack's Javascript parser will 
fail, if it simply searches for the string 'window.open'.

Another nasty javascript trick I've seen somewhere is
to use something like 
document.write('<a hr' + 'ef="some_url.html">') I don't
expect that httrack properly parses all sorts of such weird
code.

> 
> I'll try to add it to my test page soon.  Not really much 
> you can do about it except maybe adding each movie page (the 
> ones that appear in the pop-up windows; not many of them) as 
> extra project start URLs.
> 
> Abel, what's this httrack-py? :)

It is a little plugin module for httrack which defines most 
callbacks as specified in <http://www.httrack.com/html/plug.html>.
The module doesn't do anything useful by itself, but it 
allows to write "real" callbacks in Python. In this case,
one could implement the callbacks preprocess-html and 
postprocess-html (new in httrack version 3.33-beta3), 
where preprocess-html uses a regex like 
r"javaScript:launchit\('(.*?)'\)" to look for links. These
links can then be added to the page as regular
<a href="http://...."> links, and the modified HTML text 
is returned by the preprocess-html callback. When this 
modified text has been parsed by the httrack core, 
postprocess-html is called, where the inserted links can be
parsed in order to catch possible URL modifications by 
httrack, and the modified URLs can be inserted as arguments
into the launchit calls. Finally, the postprocess-html
callback can remove the inserted simple <a href> links
and return the HTML text.

Abel

Create subthread

All articles

Subject	Author	Date
Downloading .mov from site (ASP and JavaScript)		09/16/2004 09:35
Re: Downloading .mov from site (ASP and JavaScript		09/16/2004 14:14
Re: Downloading .mov from site (ASP and JavaScript)		09/17/2004 07:49
Re: Downloading .mov from site (ASP and JavaScript		09/17/2004 15:37
Re: Downloading .mov from site (ASP and JavaScript		09/19/2004 10:17
Re: Downloading .mov from site (ASP and JavaScript		09/17/2004 15:40