| I started a new project today to copy a single website/domain, and launched the
mirror. After half-an-hour I looked and noticed that it was dutifully trying
to mirror/scan pages from several external sites (most notably wikipedia).
I stopped the scan, and went into OPTIONS > LIMITS > MAX EXTERNAL SITE DEPTH =
0, and that seems to have solved the issue. In reading the docs, they clearly
state that HTTrack will "avoid crawling external sites". This is what I
expected, but clearly is not the case.
My installation is routine, and I don't believe I have made any radical
settings that might cause HTTrack to overrun it's SOP and begin crawling
external sites.
Can someone point me to settings that I should look at to prevent this in the
future? | |