HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to completely ignore 'robots.txt'?
Author: brad
Date: 03/18/2015 15:03
 
> > Ok, I'm trying to mirror a site that tells engines
> 
> > like httrack to not go down to certain
> directories. 
> > Which version of httrack allows me to complete 
> ignore 
> > these files and go down into this certain
> directory?> 
> See options/spider/spider: robots.txt -> 'never'
> 
> But also ensure that you set proper bandwidth
> limiter 
> if you are crawling big files or a large number of 
> generated pages (robots.txt are often used to avoid
> 
> server overload)
> 

> > Ok, I'm trying to mirror a site that tells engines
> 
> > like httrack to not go down to certain
> directories. 
> > Which version of httrack allows me to complete 
> ignore 
> > these files and go down into this certain
> directory?> 
> See options/spider/spider: robots.txt -> 'never'
> 
> But also ensure that you set proper bandwidth
> limiter 
> if you are crawling big files or a large number of 
> generated pages (robots.txt are often used to avoid
> 
> server overload)
> 

> > Ok, I'm trying to mirror a site that tells engines
> 
> > like httrack to not go down to certain
> directories. 
> > Which version of httrack allows me to complete 
> ignore 
> > these files and go down into this certain
> directory?> 
> See options/spider/spider: robots.txt -> 'never'
> 
> But also ensure that you set proper bandwidth
> limiter 
> if you are crawling big files or a large number of 
> generated pages (robots.txt are often used to avoid
> 
> server overload)
> 

> > Ok, I'm trying to mirror a site that tells engines
> 
> > like httrack to not go down to certain
> directories. 
> > Which version of httrack allows me to complete 
> ignore 
> > these files and go down into this certain
> directory?> 
> See options/spider/spider: robots.txt -> 'never'
> 
> But also ensure that you set proper bandwidth
> limiter 
> if you are crawling big files or a large number of 
> generated pages (robots.txt are often used to avoid
> 
> server overload)
> 

 
Reply Create subthread


All articles

Subject Author Date
Re: How to completely ignore 'robots.txt'?

11/30/2001 17:43
Re: How to completely ignore 'robots.txt'?

03/18/2015 15:03




c

Created with FORUM 2.0.11