HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Crawl-Delay and Honored robots.txt lines
Author: William Roeder
Date: 03/30/2010 19:47
 
> What robots.txt lines are recognized and honored by
> httrack, including extensions?  I once had to throw
> Crawl-Delay in to stave off Yahoo Slurp! on a
> previous server--does httrack support similar lines?
It supports the standard exclusion rules by default, but can be overriden. I
don't know about other options. Perhaps Xavier will reply.

Also by default httrack only runs 4 connections 10 con/s 100kbs max (security
limits) If you are getting contention, perhaps you need to optimize. Some
webmasters use httrack to convert their 'real' dynamic pages to static public
ones.

<http://www.httrack.com/html/abuse.html>
 
Reply Create subthread


All articles

Subject Author Date
Crawl-Delay and Honored robots.txt lines

03/30/2010 17:12
Re: Crawl-Delay and Honored robots.txt lines

03/30/2010 19:47
Re: Crawl-Delay and Honored robots.txt lines

03/30/2010 20:40
Re: Crawl-Delay and Honored robots.txt lines

03/30/2010 21:48
Re: Crawl-Delay and Honored robots.txt lines

04/07/2010 18:42




5

Created with FORUM 2.0.11