| > Just to double check... in a UNIX terminal
> httrack <http://www.my.edu/> -O
> "/archive/www.my.edu" -* +*.edu/* '%P0' -v
I don't use the cl/unix but <http://httrack.com/html/fcguide.html> shows you'll
have to quote the asterisks: "-*" "+*.edu/*"
> the time. But I can restart it like this:
> httrack <http://www.my.edu/> -O
> "/archive/www.my.edu" -* +*.edu/* '%P0' -vi
yes
> Is there a smooth way to shut it down each time so
> the collection doesn't get broken?
send one interrupt, it should finish the current transfers and stops.
Also the above link/limit options allows:
MN maximum overall size that can be uploaded/scanned
EN maximum mirror time in seconds (60=1 minute, 3600=1 hour)
GN pause transfer if N bytes reached, and wait until lock file is deleted
> Also, what will come of the links to commercial
> sites? Can I give httrack instructions to deliver
> those links to a local php script for processing?No you can't get php files
from servers. Servers execute php, cgi, asp, etc files and deliver html. A
mirror is not a backup of a site, it is a static copy.
> Since the mirror is going to end up on an intranet,
> I'd like to have a script generate a message saying
> "You have selected a commercial link that requires a
> connection to the Internet. Click here to continue
> or click here to return to the previous page."
x replace external html links by error pages
> And finally, will the mirror gets my school first
> and then come back to get the others? Or would I
it will start with my.edu and spider down and away from there. Normally it
would stay on site, but the *.edu overrides that.
You might want to place separate sites in separate subdirectories:
N104 Identical to N4 except that "web" is replaced by the site's name
| |