| Some websites have noticed that web site copiers like to surf in a layer type
fashion instead of sequential fashion like a human would do when trying to
reach a file, for example.
human:
link->setcookie->deeperlink->setcookie->stilldeeperlink->setcookiefile.zip
robot: link, link2, link3...link\deeperlink, link2\deeperlink
and the cookies will not match for the robot, because its cookie is set to
linkx when it tries to access the first "deeperlink". The whole scan goes to
ruin.
What I suggest is that an alternative "human lookalike" mode be programmed
into HTTRack, which will mimick a human in surfing behavior, because the
"human" way can be verified with cookies when accessing deeper and yet deeper
levels. These same cookies will not work for other deeper levels, which would
need their own cookie from the previous level.
Because of this, it is very easy to protect a site from HTTRack simply by
using cookies on each layer.
If HTTRack would mimick human surfing behavior and go deeper and deeper until
link tree is exhausted, it would be unstoppable in this way. The mode would
probably have to start every new link tree from top branch, adding traffic,
but with some sites that is the only way to go.
How serious is this? Very. It totally blocks HTTrack out because the cookies
do not match for the files/links HTTRack tries to fetch.
The same also happens when a site uses short time based cookies. When the site
notices an out-of-date cookie, it prompts for a "press button to enter" page,
but HTTRack puts this page onto the bottom of the search stack -> the cookies
don't get set in time, and the rest of the scan is totally ruined.
| |