| Hey thanks a lot William!
Just to double check... in a UNIX terminal environment...
To make a complete (zips, pdfs, everything) browsable copy of my school
website and any other school websites that it links to - but not commercial
sites that it links to - then I could use this:
httrack <http://www.my.edu/> -O "/archive/www.my.edu" -* +*.edu/* '%P0' -v
Correct?
Plus, I can only run it a few hours at night because I don't have a computer
that I can leave running all the time. But I can restart it like this:
httrack <http://www.my.edu/> -O "/archive/www.my.edu" -* +*.edu/* '%P0' -vi
Correct?
If so...
Is there a smooth way to shut it down each time so the collection doesn't get
broken?
Also, what will come of the links to commercial sites? Can I give httrack
instructions to deliver those links to a local php script for processing?
Since the mirror is going to end up on an intranet, I'd like to have a script
generate a message saying "You have selected a commercial link that requires a
connection to the Internet. Click here to continue or click here to return to
the previous page."
And finally, will the mirror gets my school first and then come back to get
the others? Or would I need a different approach? I thought to do one at a
time, but didn't see a way to merge the collections after the fact.
Thanks again William, and thanks in advance to anyone else that may share some
insight.
| |