HTTrack Website Copier
Free software offline browser - FORUM
Subject: How to keep the spider from wandering off site?
Author: foo
Date: 03/02/2018 23:40
 
I want to spider this site:

Using this starting url:

<https://foo:bar@www.instructables.com/recent>

I added a scan rule to ignore zip files:

-*.zip

But the spider keeps trying to go all over the world to other hosts.  It's
trying to also spider tumber, twitter, google, everything else under the sun.

How to keep it focused only on the host I asked for?
 
Reply


All articles

Subject Author Date
How to keep the spider from wandering off site?

03/02/2018 23:40




7

Created with FORUM 2.0.11