HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: copying sites from http://web.archive.org
Author: Xavier Roche
Date: 04/27/2003 08:44
 
Just look in hts-log.txt and read the warning:

08:40:30 Info:  Note: due to web.archive.org remote 
robots.txt rules, links begining with these path will be 
forbidden: /archive_notices/, /cgi-
bin/, /collections/e2k/, /collections/government/, /collecti
ons/news/, /collections/now/, /collections/pioneers/, /colle
ctions/sep11/, /collections/web/, /db_dir/, /images/, /live_
dir/, /privage_pages/, /spec/, /web/, /e2k/ (see in the 
options to disable this)

Disable robots.txt (Set Options / Spider) to capture the 
site, **BUT** before that ensure that you can copy the site 
without restrictions, **AND** use reasonnable bandwidth 
limits (such as 3KB/s) as webarchive is often overloaded : 

Set Options / Limits / Max transfer rate / 3000

I repeat: USE reasonnable bandwidth limits!
 
Reply Create subthread


All articles

Subject Author Date
copying sites from http://web.archive.org

04/26/2003 08:37
Re: copying sites from http://web.archive.org

04/27/2003 08:44
Re: copying sites from http://web.archive.org

04/29/2003 09:58
Re: copying sites from http://web.archive.org

09/24/2004 00:19
Re: copying sites from http://web.archive.org

10/01/2004 10:54
Re: copying sites from http://web.archive.org

04/10/2010 19:01
Re: copying sites from http://web.archive.org

08/27/2010 19:43
Re: copying sites from http://web.archive.org

04/07/2015 16:52
Re: copying sites from http://web.archive.org

10/06/2017 13:32
Re: copying sites from http://web.archive.org

03/08/2018 10:12
Re: copying sites from http://web.archive.org

03/11/2018 05:28
Re: copying sites from http://web.archive.org

11/30/2018 13:14
Re: copying sites from http://web.archive.org

01/11/2019 16:33
Re: copying sites from http://web.archive.org

02/13/2020 12:04
Re: copying sites from http://web.archive.org

04/16/2020 17:23




0

Created with FORUM 2.0.11