HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Cookie based copy protect measures
Author: Filer
Date: 07/12/2002 23:44
How's about a quick & dirty version, which would simply change the scan order
to "depth", as opposed to the current which could be termed "parallel"? 

This kind of behavior could also produce quick results in scanning web sites
that go deep and have a lot of parallel links. Sometimes it is totally
frustrating when encountering something like this, and later even finding out
that there's something wrong with your scan rules and the end result is not
downloaded right...with a depth penetration algorithm this kind of cases could
be solved more easily. Not forgetting the rather improtant "human look"
surfing pattern. Combine this with a random delay (user configurable range
with min & max wait periods, like "wait random time between x and y ms")
between requests and we are approaching something very undetectable.

This would at least solve the cases where, when an old cookie is detected, the
user is prompted for a new "click here to continue" page, which sets a new
cookie after the click and redirects the user to continue?
This is an actual case I have encountered, and which defeated HTTRack not
totally, but reasonably well: because of the "click here to continue" page,
all the rest parallel levels produced an error, and scanning only continued
after this "click here" page was reached. With 100+ parallel links, each of
which deep enough to produce a "click here" page with nice scan settings (1
req/s, 1 process). This in turn turned the amount of requests 100+ fold, and
caused the scan to take forever, should one ride it out...

The only solution possible for this was being a total asshole, hogging the
line totally, and scanning the 100+ parallel levels as separate scans. That is
a nasty thing to do to a web site, and if it could be avoided I'd rather do

It sometimes amazes me that web designers do not realize that if they prevent
people from scanning the site with "nice" settings, the only thing they will
actually get back is people scanning them in not-so-nice ways! I believe that
educated, smart use of scanners actually lessens the loads of web sites,
because of the steady way a scanner operates network loads stay steady and
well managed. Of course it would be nice if the "push" technology had actually
materialized. Maybe still some day, when p2p networks get more advanced.
Reply Create subthread

All articles

Subject Author Date
Cookie based copy protect measures

07/12/2002 15:36
Re: Cookie based copy protect measures

07/12/2002 20:45
Re: Cookie based copy protect measures

07/12/2002 23:44


Created with FORUM 2.0.11