Re: External Sites Captured with External Depth = 0

Subject: Re: External Sites Captured with External Depth = 0

Author: Thomas Traina

Date: 05/06/2014 22:11

> > I'm having some difficulty with a site capture
> I'm
> > running. Although I've set the external mirroring
> > depth to 0, (and left it unset, and set it to 1
> in
> > some of my debugging) crawls of this specific
> site
> > end up trying to capture Wikipedia in addition to
> > the target site. I believe I've isolated the link
> > that causes it, and I've posted a demo site to
> show
> > this problem at <http://stempac.net>. The link on
> > "Page Two" to wikicommons under the picture of
> the
> > courthouse is the offending link.
> 
> 
> Did you leave the default Scan Rules (Options / Scan
> Rules), which are by default "+*.png +*.gif +*.jpg
> +*.css +*.js" ?
Xavier:

Usually I check off all 3 of the pre-defined groups in the UI and add +*.pdf

I've also done the default and had the same result both ways.

Create subthread

All articles

Subject	Author	Date
External Sites Captured with External Depth = 0		05/06/2014 15:53
Re: External Sites Captured with External Depth = 0		05/06/2014 19:49
Re: External Sites Captured with External Depth = 0		05/06/2014 22:11
Re: External Sites Captured with External Depth = 0		05/08/2014 09:38
Re: External Sites Captured with External Depth = 0		08/11/2014 20:34