HTTrack Website Copier
Free software offline browser - FORUM
Subject: I'm trying to download the internet
Author: Matthew Ostrowski
Date: 01/26/2005 00:54
Unlike most of you, I am trying to delimit rather than limit my spidering as 
much as possible for an art project I'm working on.  I have been using wget, 
with reasonable results, but it has a tendency to die rather quickly.  I've
experimenting with httrack for a few days, and it seem to have some 
advantages, but I am having trouble crossing from one domain to another: I'll

get the homepage, but no more.  I'm using the following options:

httrack <> -O /Volumes/sounds/httrack_get 


assume sp=text/html,php3=text/html,cgi=image/gif

ext-depth 512

user-agent "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

What I get is about 27 files downloaded, and then things no longer download, 
though the program is still running, displaying messages like this: (168 bytes) - OK

The matching line in my log:
23:35:33	Info: 	engine: save-name: local name:
search/hotsearch.html -> hotsearch.html

It has apparently not downloaded, just checked. (I realize that as a
it may not download, but html files don't either off the main domain.

Now that is Netscape, who knows what protection they have, but I get this 
23:31:54	Info: 	engine: transfer-status: link recorded: -> /Volumes/sounds/

I have that file -- it's another homepage, but I'm not getting anything from 
the site past that point.  If I try the site directly, it

downloads, no problem.

I thought the e flag, plus the %e depth, would cover this.  What am I doing 
wrong?  And have you any other tips for promiscuous downloading?


All articles

Subject Author Date
I'm trying to download the internet

01/26/2005 00:54
Re: I'm trying to download the internet

01/29/2005 14:25
Re: I'm trying to download the internet

01/31/2005 12:55
Re: I'm trying to download the internet

02/01/2005 01:16


Created with FORUM 2.0.11