HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to completely ignore 'robots.txt'?
Author: Vladimir
Date: 10/03/2010 02:01
 
> > Ok, I'm trying to mirror a site that tells engines
> > like httrack to not go down to certain directories. 
> > Which version of httrack allows me to complete 
> > ignore 
> > these files and go down into this certain directory?> 
> See options/spider/spider: robots.txt -> 'never'
> 
> But also ensure that you set proper bandwidth limiter 
> if you are crawling big files or a large number of 
> generated pages (robots.txt are often used to avoid
> server overload)
> 

The option has changed somewhat, over the years..:)
img843.imageshack.us/img843/8703/103201014518am.png
 
Reply Create subthread


All articles

Subject Author Date
How to completely ignore 'robots.txt'?

11/30/2001 11:41
Re: How to completely ignore 'robots.txt'?

11/30/2001 17:43
Re: How to completely ignore 'robots.txt'?

10/03/2010 02:01
Re: How to completely ignore 'robots.txt'?

02/19/2013 19:52
Re: How to completely ignore 'robots.txt'?

01/11/2015 22:18
Re: How to completely ignore 'robots.txt'?

03/18/2015 15:10
Re: How to completely ignore 'robots.txt'?

05/23/2015 22:13
Re: How to completely ignore 'robots.txt'?

05/25/2017 03:56
Re: How to completely ignore 'robots.txt'?

01/16/2020 19:36
Re: How to completely ignore 'robots.txt'?

04/21/2023 23:47




7

Created with FORUM 2.0.11