|  | Unlike most of you, I am trying to delimit rather than limit my spidering as 
much as possible for an art project I'm working on.  I have been using wget, 
with reasonable results, but it has a tendency to die rather quickly.  I've
been 
experimenting with httrack for a few days, and it seem to have some 
advantages, but I am having trouble crossing from one domain to another: I'll
get the homepage, but no more.  I'm using the following options:
httrack <http://www.somesite.org> -O /Volumes/sounds/httrack_get 
-C0N1003s0K%e9999r9999zI0b1nBe
.httrackrc:
assume sp=text/html,php3=text/html,cgi=image/gif
ext-depth 512
user-agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
What I get is about 27 files downloaded, and then things no longer download, 
though the program is still running, displaying messages like this:
channels.netscape.com/ns/search/hotsearch.jsp (168 bytes) - OK
The matching line in my log:
23:35:33	Info: 	engine: save-name: local name: channels.netscape.com/ns/
search/hotsearch.html -> hotsearch.html
It has apparently not downloaded, just checked. (I realize that as a
javascript, 
it may not download, but html files don't either off the main domain.
Now that is Netscape, who knows what protection they have, but I get this 
link:
23:31:54	Info: 	engine: transfer-status: link recorded: 
www.throughthecracks.org/index.html -> /Volumes/sounds/
httrack_get_pan2/index-9.html
I have that file -- it's another homepage, but I'm not getting anything from 
the throughthecracks.org site past that point.  If I try the site directly, it
downloads, no problem.
I thought the e flag, plus the %e depth, would cover this.  What am I doing 
wrong?  And have you any other tips for promiscuous downloading?
Thanks,
\M |  |