HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to completely ignore 'robots.txt'?
Author: Xavier Roche
Date: 11/30/2001 17:43
 
> Ok, I'm trying to mirror a site that tells engines 
> like httrack to not go down to certain directories. 
> Which version of httrack allows me to complete 
ignore 
> these files and go down into this certain directory?
See options/spider/spider: robots.txt -> 'never'

But also ensure that you set proper bandwidth limiter 
if you are crawling big files or a large number of 
generated pages (robots.txt are often used to avoid 
server overload)
 
Reply Create subthread


All articles

Subject Author Date
How to completely ignore 'robots.txt'?

11/30/2001 11:41
Re: How to completely ignore 'robots.txt'?

11/30/2001 17:43
Re: How to completely ignore 'robots.txt'?

03/18/2015 15:03
Re: How to completely ignore 'robots.txt'?

10/03/2010 02:01
Re: How to completely ignore 'robots.txt'?

02/19/2013 19:52
Re: How to completely ignore 'robots.txt'?

01/11/2015 22:18
Re: How to completely ignore 'robots.txt'?

03/18/2015 15:10
Re: How to completely ignore 'robots.txt'?

05/23/2015 22:13
Re: How to completely ignore 'robots.txt'?

05/25/2017 03:56
Re: How to completely ignore 'robots.txt'?

01/16/2020 19:36
Re: How to completely ignore 'robots.txt'?

04/21/2023 23:47




b

Created with FORUM 2.0.11