| I run a website called Rosetta Code, and a user of httrack is currently
monopolizing the site's CPU resources (enough that, despite all my caching,
I'm seeing disk I/O contention for the first time on this server.
I don't mind if someone wants to copy the content of the site or use an
offline browser. That's cool with me. What I'm concerned about is trying to
guard against is excessive traffic and overutilization of server resources.
What robots.txt lines are recognized and honored by httrack, including
extensions? I once had to throw Crawl-Delay in to stave off Yahoo Slurp! on a
previous server--does httrack support similar lines? | |