| > > I'm having some difficulty with a site capture
> I'm
> > running. Although I've set the external mirroring
> > depth to 0, (and left it unset, and set it to 1
> in
> > some of my debugging) crawls of this specific
> site
> > end up trying to capture Wikipedia in addition to
> > the target site. I believe I've isolated the link
> > that causes it, and I've posted a demo site to
> show
> > this problem at <http://stempac.net>. The link on
> > "Page Two" to wikicommons under the picture of
> the
> > courthouse is the offending link.
>
>
> Did you leave the default Scan Rules (Options / Scan
> Rules), which are by default "+*.png +*.gif +*.jpg
> +*.css +*.js" ?
Xavier:
Usually I check off all 3 of the pre-defined groups in the UI and add +*.pdf
I've also done the default and had the same result both ways. | |