HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Why do you allow robots.txt to be overriden?
Author: Annajiat Alim Rasel
Date: 12/28/2005 00:47
> Xavier,
> Making a live copy is not the only way to infringe
> on copyrights.
> Creating copies (what your software does) IS IN FACT
> copyright infringement... even if these are not
> republished or distributed, if they only live in
> somebody's PC... they're still copies, which if done
> without the author's permission is illegal.

Pages live in Computers due to several reasons:
1. User cache, did you know pages are cached by default? and they live long!
Furthermore prefetching works like HTTrack, i.e. Google Web Accelerator.
2. ISP cache, in many places, ISPs have downloaded your website for better
3. Saved & Printed Pages, Any browser allows to save & print a webpage.
4. Others (at least two places where full copies of most of websites exists)

I am not telling these are legal/illegal. These are pretty hard to stop.
However, the abuse FAQ which you already read, is a source of critical
information to stop such activities and is hardly found on the net. This
information is very useful and is used to protect itself
according to my belief. 

> You guys need to research a little on copyright law,
> just to understand your responsibilities having
> created this software.
> I believe you should have a prominent notice on
> copyrights available to your users. And I believe
> your software should not override robots.txt - no
> matter why.

Xavier, what if a link to <> is put around
the robot rules option?
> The more I read through your forum, the more I see
> from users questions that this tool is unfortunately
>  used in most cases to infringe on people's
> copyrights. I find users here asking why they can't
> download someone's PHP script, or $50 web templates,
> and I find it sad that you won't take responsibility
> for this and at least try to do all you can to make
> your users aware of what they are doing.. and at
> least try to err on the side of the law.

Hmm... Abusers exist all over the net; however, they are hardly assisted in
their activity. An example of denying possible abuse can be observed if you
read this post <>
(I am not telling the author of the post is an abuser, just giving an example
of possible abuse knowingly or unknowingly)

I think, HTTrack obeys robot rules by default to prevent unknowingly abuse.

Quote from <>
Now, if you say isn't there a way to trick the server to handover the
files to you? The answer is may be or may not be. However, how this could be
done is beyond the scope of this forum and that would be improper behavior.

> If one of those libraries or users want to copy my
> site for a legal use, I prefer them contacting me
> directly, explaining me what they're doing and
> asking. 

Not all law officials will let webmasters know that "I am coming to check for
illegal materials in your site".
Your preference is surely logical and if I were a webmaster I would prefer the
same. However, theoretically it is impossible to stop mirroring. What we can
do is to decrease it using abuse FAQ or some other techniques already in use
for >3 years.

I explicitly allow some web preservation
> sites to grab copies of my site. I just have a
> problem with any user, any time, making whatever
> copies he wants without asking - and you making it
> so easy for them.

Too many tools exist to mirror websites. Worst case is, not all of them can
generally be classified as mirroring software (obviously they hardly obey
robot rules). 

> If you really want to offer a good tool to those
> users that would use it legally, you would do all
> you can to prevent abuse. Users with good reasons
> can always contact authors and request permission.

Ignoring robot rules is not always an abuse. Many webmasters include HTTrack
in list of their maintenance tools. Many of them use HTTrack only because it
can ignore robot rules if not for other benefits.

Let me quote some parts of

Copyright lawsuits should be matters between copyright owners and copyright
infringers (and, where appropriate, those who profit from or contribute to the
infringing activity); infrastructure players on the sidelines should not be

HTTrack does not encourage, that's why it's help file, website contains abuse
FAQ and robot rules are obeyed by default.

What'll be the effect of removing the robot rules handling from HTTrack?The
same effect as taking a bucket of water from the sea.
1. Too many mirroring tools exist. 
2. Source code is out there. (Not only of HTTrack but also others).
3. Any browser allows printing or saving web pages. (allowing to make illeagal

If no. 1 is taken care of by sending notices to all of them. It would affect
only the new users. And for old users, nobody can do anything. Furthermore,
no. 2 & 3 are impossible to take care of.

The best solution is to raise awareness. Second best solution is following
abuse FAQ. I believe, other solutions might not be sufficient.

You might able to decrease number of abusers by using a small footer saying
"You are from _ IP of _ ISP located in _ area". There information are logged.
Strong legal actions will be taken again abuse activities such as downloading
this website if the necessity is felt".

Annajiat Alim Rasel
Reply Create subthread

All articles

Subject Author Date
Why do you allow robots.txt to be overriden?

12/27/2005 21:13
Re: Why do you allow robots.txt to be overriden?

12/27/2005 22:00
Re: Why do you allow robots.txt to be overriden?

12/27/2005 22:39
Re: Why do you allow robots.txt to be overriden?

12/28/2005 00:47
Re: Why do you allow robots.txt to be overriden?

12/30/2005 11:18
Re: Why do you allow robots.txt to be overriden?

01/13/2006 12:23
Re: Why do you allow robots.txt to be overriden?

12/25/2006 19:24
Re: Why do you allow robots.txt to be overriden?

06/26/2007 19:49
Re: Why do you allow robots.txt to be overriden?

04/04/2010 08:06


Created with FORUM 2.0.11