HTTrack Website Copier
Free software offline browser - FORUM
Subject: Crawl-Delay and Honored robots.txt lines
Author: Michael Mol
Date: 03/30/2010 17:12
 
I run a website called Rosetta Code, and a user of httrack is currently
monopolizing the site's CPU resources (enough that, despite all my caching,
I'm seeing disk I/O contention for the first time on this server.

I don't mind if someone wants to copy the content of the site or use an
offline browser. That's cool with me. What I'm concerned about is trying to
guard against is excessive traffic and overutilization of server resources.

What robots.txt lines are recognized and honored by httrack, including
extensions?  I once had to throw Crawl-Delay in to stave off Yahoo Slurp! on a
previous server--does httrack support similar lines?
 
Reply


All articles

Subject Author Date
Crawl-Delay and Honored robots.txt lines

03/30/2010 17:12
Re: Crawl-Delay and Honored robots.txt lines

03/30/2010 19:47
Re: Crawl-Delay and Honored robots.txt lines

03/30/2010 20:40
Re: Crawl-Delay and Honored robots.txt lines

03/30/2010 21:48
Re: Crawl-Delay and Honored robots.txt lines

04/07/2010 18:42




e

Created with FORUM 2.0.11