| This flame is going no where. Whenever you have a tool,
there's always the chance of misuse or abuse. The key is
to know what is acceptable and good vs what is damaging.
Turning off the httrack robots.txt check is meant to do be
done by someone who has read and understood the
implications.
> on their own sites. Some of these are huge archivers,
Archiving is necessary BECAUSE the web changes so often.
Information is added and removed all the time, and without
local copies that information dies a digital death. If you
think removing a page from a website deletes it from all
existence, you're wrong...you can't delete my print-on-
paper copy, nor the photo I took with my camera pointed at
the monitor displaying the website. If you make the
information available you should understand it may be next
to impossible to make it unavailable later. I hate it when
a good page I've read disappears from the 'net because
someone thought it wasn't needed anymore. A local copy
fixes that problem.
> HTTrack is a powerful tool and it is all the more
powerful
> because it comes with source code which can be modified.
If
> it did not have the 'override robots.txt' capability,
> someone could easily add it anyway by modifying the
source
> and putting the changed version up for public download.
There is another web copier I've used that enforces the use
of robots.txt, and has done a better job for me than
HTTrack at a few sites in the past. That robots.txt
limitation easily removed thru the application of a little
programming and debugging knowledge in conjunction with a
hex editor. In fact, it wouldn't even be necessary to use
a hex editor...with Apache web server as a proxy for
another site you could tell use Apache to redirect ANY url
that tried to get robots.txt, effectively disabling the
robots.txt function built into the web copier. If the web
copier didn't have proxy support, well, then you could
figure out how to run a transparent web proxy that'd do
this automatically.
> Then there would be two versions going around. So, should
> the source code not be distributed? That is ultimately
> where your argument must lead. All because a relatively
> very small number of users choose to use the baseball
> bat 'improperly'.
True. Drunk drivers are bad...but we don't ban either
alcohol or cars. And baseball bats can be sold to ANYONE.
> Hypocrisy comes in many forms, we must all think of the
> consequences of what we do and what we say, what we stand
> for and what we are against. Perhaps it is a thin line,
but
> in this case it seems to me Xavier Roche has made a very
> balanced choice, which is all we can hope to do in this
> life.
>
> john
I agree, and I believe Xaviar's program is a good thing.
Let me add that if any web servers are crashing when a
properly used web copier crawls thru them, those servers
are very poorly maintained indeed. Either they need to
change server software to something stable (ex. Apache) or
their hardware is flaky, in which case ANY web client may
cause it to fail. Buggy web server software and/or flaky
hardware is going to crash whether or not a web copier
visits it.
| |