HTTrack Website Copier
Free software offline browser - FORUM
Subject: access blocked
Author: account37
Date: 05/27/2002 17:10
 
Using HTTrack I was blocked by a web-site:

<http://senseis.xmp.net/>


On the web-site they explained the reasons:

They check the referrer, they use trap links and they want 
a wait peroid between requests.

Does HTTrack use the referrer?Is it possible to conigure an automatic wait
period between 
requests?
How can I enable HTTrack to mirror this web-site?
I attached the content of the access blocked message.

account37

----------------------------------------------------------


Access Blocked 
------------------------------------------------------------
--------------------

    Keywords: SL description
Sensei's Library tries to protect itself from dumb 
mirroring scripts that issue some thousand requests within 
minutes bringing our server to its knees. 

A first meassure is to block access to any function other 
than viewing a page, if there is no referrer information 
present[1]. What does this mean? 

Every time you click on a link your browser sends a request 
to our server to get the desired page and pictures. This 
request not only contains the pagename which you would like 
to see, but also which page you are coming from (referrer 
information). 

However, mirroring scripts don't send this information. So 
checking for the referrer information is an easy way to 
distinguish between scripts and regular browsers. 

If there is no referrer information than everything but 
viewing a page is blocked (e.g. diff, edit, save, search, 
pageinfo, ...). 

If you get the "AccessBlocked" message as a regular user 
than either you are not using a standard browser, or have 
configured your browser in a way to not send the referrer 
information, or a proxy you are using is removing this 
information. Solution: change your settings or set a cookie
[1]. 

As the referrer information has to originate from within SL 
it is no longer possible to link to diff, pageinfo, etc. 
from other websites. Note: you can still link to pages 
themselves. 

As a second messure, if the misbehaving script insists on 
requesting such pages over and over it will be dynamically 
added to a block list for 48 hours. There's also a trap 
link on the pages for scripts to follow. Users should not 
normally be able to see this link. (You can see it in the 
source, but don't try it out or your address will be 
blocked for 48 hours. Really. We mean it.) 

The above meassures should shield SL from the most 
offensive scripts. What if you would still like to 
mirror/download SL? Use a friendly script such as wget 
which obeys robots.txt. Or download a ready packed snapshot 
at SLSnapshot. If you use wget don't forget to specify a 
wait period between the requests (at least "-w 3"). Yes, it 
will take some hours, but that way our server will still be 
accessible to others as well. If this advice isn't followed 
we may think of even more restrictive meassures. You have 
been warned. 

Contact ArnoHollosi or MortenPahle if you have further 
questions. 

[1] The referrer check is circumvented, if you have the 
SLPrefs cookie set. (e.g. Mozilla currently doesn't send 
referrer information if you open a page in a new window). 


------------------------------------------------------------
--------------------

Gorobei Dumb Question: you do have a robots.txt files to 
keep well-behaved spiders from hammering the site? 

Arno: yes we do:  <http://senseis.xmp.net/robots.txt> But 
those dumb scripts used recently don't obey robots.txt. 


------------------------------------------------------------
--------------------

I'm getting 'Access blocked because of missing referrer 
information' if I try to use 'open in new window' to get at 
a diff page. Would it perhaps be possible to allow access 
to agents which send an SLPrefs cookie, even in the absence 
of referrer information? (using oldish Mozilla (0.9.1) on 
Gnu/Linux.) 

--Matthew Woodcraft 

Added by-pass of check for users who have the cookie set. I 
verified this with Mozilla 0.9.4 - bug is still there. 
Actually it's known to the Mozilla team as bug #48902 for 
over a year now. It seems that the next build will contain 
a fix. We will see.... --Arno 

Bill Spight: I am writing this using Internet Explorer. For 
some reason this morning my Netscape does not allow me to 
edit pages on SL. I do not know of any changes that might 
have caused that. Using Netscape, my user name does not 
show up, either, just a '-'. I tried resetting my User 
Preferences, but just got the Access Blocked message when I 
submitted them. I usually use Netscape, so I would 
appreciate any help in getting it to work on SL again. 
Thanks. :-) 

Arno: I could not verify this behaviour with Mozilla 0.9.9 
(Windows) nor with Netscape 4.7 (Linux). Do you still have 
this problem? The server logs don't show anything 
suspicious.... 



------------------------------------------------------------
--------------------
Access Blocked last edited by ArnoHollosi (80.109.254.29) 
on April 9, 2002 - 21:10 
 
Reply


All articles

Subject Author Date
access blocked

05/27/2002 17:10
Re: access blocked

05/27/2002 22:11
Re: access blocked

05/28/2002 10:07




5

Created with FORUM 2.0.11