HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: HTTrack vs webmasters
Author: Haudy Kazemi
Date: 01/27/2003 23:31
 
This flame is going no where.  Whenever you have a tool, 
there's always the chance of misuse or abuse.  The key is 
to know what is acceptable and good vs what is damaging.  
Turning off the httrack robots.txt check is meant to do be 
done by someone who has read and understood the 
implications.

> on their own sites. Some of these are huge archivers, 
Archiving is necessary BECAUSE the web changes so often.  
Information is added and removed all the time, and without 
local copies that information dies a digital death.  If you 
think removing a page from a website deletes it from all 
existence, you're wrong...you can't delete my print-on-
paper copy, nor the photo I took with my camera pointed at 
the monitor displaying the website.  If you make the 
information available you should understand it may be next 
to impossible to make it unavailable later.  I hate it when 
a good page I've read disappears from the 'net because 
someone thought it wasn't needed anymore.  A local copy 
fixes that problem.

> HTTrack is a powerful tool and it is all the more 
powerful 
> because it comes with source code which can be modified. 
If 
> it did not have the 'override robots.txt' capability, 
> someone could easily add it anyway by modifying the 
source 
> and putting the changed version up for public download. 

There is another web copier I've used that enforces the use 
of robots.txt, and has done a better job for me than 
HTTrack at a few sites in the past.  That robots.txt 
limitation easily removed thru the application of a little 
programming and debugging knowledge in conjunction with a 
hex editor.  In fact, it wouldn't even be necessary to use 
a hex editor...with Apache web server as a proxy for 
another site you could tell use Apache to redirect ANY url 
that tried to get robots.txt, effectively disabling the 
robots.txt function built into the web copier.  If the web 
copier didn't have proxy support, well, then you could 
figure out how to run a transparent web proxy that'd do 
this automatically.

> Then there would be two versions going around. So, should 
> the source code not be distributed? That is ultimately 
> where your argument must lead. All because a relatively 
> very small number of users choose to use the baseball 
> bat 'improperly'.

True.  Drunk drivers are bad...but we don't ban either 
alcohol or cars.  And baseball bats can be sold to ANYONE.

> Hypocrisy comes in many forms, we must all think of the 
> consequences of what we do and what we say, what we stand 
> for and what we are against. Perhaps it is a thin line, 
but 
> in this case it seems to me Xavier Roche has made a very 
> balanced choice, which is all we can hope to do in this 
> life.
> 
> john

I agree, and I believe Xaviar's program is a good thing.  

Let me add that if any web servers are crashing when a 
properly used web copier crawls thru them, those servers 
are very poorly maintained indeed.  Either they need to 
change server software to something stable (ex. Apache) or 
their hardware is flaky, in which case ANY web client may 
cause it to fail.  Buggy web server software and/or flaky 
hardware is going to crash whether or not a web copier 
visits it.
 
Reply Create subthread


All articles

Subject Author Date
HTTrack vs JOC Webspider

01/25/2003 15:30
Re: HTTrack vs JOC Webspider

01/25/2003 22:38
Re: HTTrack vs JOC Webspider

01/26/2003 08:59
Re: HTTrack vs JOC Webspider

01/26/2003 14:50
HTTrack vs webmasters

01/27/2003 14:03
Re: HTTrack vs webmasters

01/27/2003 19:10
Re: HTTrack vs webmasters

01/27/2003 21:45
Re: HTTrack vs webmasters

01/27/2003 22:46
Re: HTTrack vs webmasters

01/27/2003 23:03
Re: HTTrack vs webmasters

01/27/2003 23:22
Re: HTTrack vs webmasters

01/27/2003 23:31
Re: HTTrack vs webmasters

01/28/2003 07:23
Re: HTTrack vs webmasters

01/28/2003 22:12
Re: HTTrack vs webmasters

04/14/2005 19:22




3

Created with FORUM 2.0.11