| Just look in hts-log.txt and read the warning:
08:40:30 Info: Note: due to web.archive.org remote
robots.txt rules, links begining with these path will be
forbidden: /archive_notices/, /cgi-
bin/, /collections/e2k/, /collections/government/, /collecti
ons/news/, /collections/now/, /collections/pioneers/, /colle
ctions/sep11/, /collections/web/, /db_dir/, /images/, /live_
dir/, /privage_pages/, /spec/, /web/, /e2k/ (see in the
options to disable this)
Disable robots.txt (Set Options / Spider) to capture the
site, **BUT** before that ensure that you can copy the site
without restrictions, **AND** use reasonnable bandwidth
limits (such as 3KB/s) as webarchive is often overloaded :
Set Options / Limits / Max transfer rate / 3000
I repeat: USE reasonnable bandwidth limits!
| |