HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Any reliable way to block HTTrack
Author: Xavier Roche
Date: 03/21/2001 13:43
 
> The way HTTrack was used on our site was much 
like a 
> denial of service attack.

Some users do abuse with tools like HTTrack, that's a 
fact. This was the same with tools like lftp, which 
are really great, but which can be used in a wrong way 
(through multiple ftp connections). I have set up some 
standard behaviour to avoid problems (like default 
cnx. to 8 and limiting the number of connections per 
second), but these limits can be overridden, and 
therefore can cause a network abuse. 

I can not set hard limits - setting up 32 simultaneous 
connection, for example, can be useful to test the 
intranet response and performances (I often use 
HTTrack to do some load tests on the intranet), or to 
crawl websites with limited-banwidth option (useful 
for sites with numerous links to test : many 
simultaneous parsings, but with limited bandwidth to 
avoid overload). The same goes for robots.txt 
disabling option: useful for forums, useless for 
standard spiders, but useful for a user that wants to 
keep some messages archive. And the same also for user-
agent: some websites will only give the correct 
content if the user-agent is MSIE-5.5 like browser.
The problem is to mix ALL these rules in a bad way: 
many simultaneous connections, bandwidth limit 
disabled, robots disabled, connection limit disabled 
and so on.. it's really impossible to detect a 
potential abusing configuration, which will also 
depends on the internet connection speed (you won't 
overload a website with a 56K modem, even with 32 
connections. You will with your T3 line..)

The best way is to warn the users not do abuse the 
bandwidth, or more efficiently set limits on the n# of 
requests or on the bandwidth allowed per users (see 
mod_bandwidth and mod_throttle for Apache), as it is 
done for many webservers and ftp servers. 

Filtering the user-agent, or setting robots.txt rules 
will work will all "normal" users - but this 
won't for 
people who REALLY want to abuse. In case of repetive 
abuses, that might be attacks too, the best way is to 
warn the admin of the abuser, or temporary filter the 
IP (a good solution is to count the hits per 30'' - 
and if reaching a limit, temporary ban for 3' the IP 
address)

Again, as said in the faqs, abuse is really a problem 
for tools like offline browsers. We didn't develop 
this tool to cause neither any bandwidth abuse or 
attacks, nor any other abuse, like the famous email-
grabbers that some nasty companies sells. We are GPL, 
free, with sourcecode given, so we really don't have 
any interest to give people potentially dangerous 
tools. 

We tried to disable obviously dangerous options 
(setting up multiple-proxies for load balancing, email 
catcher..). But some people will always use in a bad 
way all tools they'll get.

 
Reply Create subthread


All articles

Subject Author Date
Any reliable way to block HTTrack

03/21/2001 10:51
Re: Any reliable way to block HTTrack

03/21/2001 13:43
Re: Any reliable way to block HTTrack

06/08/2015 09:57
Re: Any reliable way to block HTTrack

11/18/2020 06:38
Re: Any reliable way to block HTTrack

03/21/2001 21:25
Re: Any reliable way to block HTTrack

03/24/2001 09:11
Re: Any reliable way to block HTTrack

03/26/2001 22:53
Re: Any reliable way to block HTTrack

03/27/2001 15:34
Re: Any reliable way to block HTTrack

03/27/2001 21:40
Re: Any reliable way to block HTTrack

03/31/2001 06:12
Re: Any reliable way to block HTTrack

06/13/2005 00:35
Re: Any reliable way to block HTTrack

07/28/2011 10:40




5

Created with FORUM 2.0.11