External sites - HTTrack Website Copier Forum

Subject: External sites

Author: Carl

Date: 05/20/2014 22:36

I started a new project today to copy a single website/domain, and launched the
mirror. After half-an-hour I looked and noticed that it was dutifully trying
to mirror/scan pages from several external sites (most notably wikipedia).

I stopped the scan, and went into OPTIONS > LIMITS > MAX EXTERNAL SITE DEPTH =
0, and that seems to have solved the issue. In reading the docs, they clearly
state that HTTrack will "avoid crawling external sites". This is what I
expected, but clearly is not the case.

My installation is routine, and I don't believe I have made any radical
settings that might cause HTTrack to overrun it's SOP and begin crawling
external sites.

Can someone point me to settings that I should look at to prevent this in the
future?

All articles

Subject	Author	Date
External sites		05/20/2014 22:36
Re: External sites		07/03/2014 02:33