| That is fine. First friendly answer I have ever get from here. ^_^ Glad
to hear that your tool will enforce a hard limit for resources used. I wish
you can keep your words.
I have downloaded HTTrack and tried it. I was frightened that it can
disable respecting robots.txt rules, setting the User-Agent string. It
doesn't work for "Disallow: /" (disallow everything), either. Those are not
polite for a spider, too. Please remove those options, and fix the "Disallow:
/" problem. Not all contents are GPL or GDL. Please respect the content
providers' intension on how to provide or distribute our content, be honest
about your user-agent identity and allow content providers to have special
treatment on you.
You may refer to the robots.txt standard at:
<http://www.robotstxt.org/wc/norobots.html> . The "Disallow: " is only a
prefix. Anything matching that prefix should be excluded, not only
directories.
I have double checked the wget manual. No, there is have no option for
simutanuous connections. The user cannot disable robots.txt rules, too.
I'm writing GNU GPL softwares, too. I believe GNU GPL softwares are made
to help people, but not to hurt people.
A small suggestion: Allowing abuse only within the same IP network, or
localhost. That is not hard. | |