| > Problem 1 - the program seem to be grabbing multiple
> instances of certain web pages from the above website,
> namely privacy.shtml, tou.shtml, legalfaq.shtml and some
> other web pages as well.
Different URLS? Like ..privacy.shtml?id=12
and ..privacy.shtml?id=34 ?
Look in hts-cache/new.txt for refferences to privacy.shtml -
the URLS should be different?
> I have noticed that these web pages
> are pretty much referenced from the bottom of almost every
> page in the website. This is causing extreme delays in the
> full download of the website
Darn.. try avoiding these files (options/scan rules : -
*privacy.shtml* -*tou.shtml* -*legalfaq.shtml*), but I'm
still surprised that this problem can occur
> Problem 2 - Under the 'Scan Rules' of the program, I have
> specified for the program to avoid downloading anything in
> the 'ladder' folder by using the following scan rule:
> -www.battle.net/war3/ladder/*
> However, the program is still downloading the contents of
> the website.
Are you still getting links from
www.battle.net/war3/ladder/ ??
- Ensure that -www.battle.net/war3/ladder/* is the last
filter, and that other filters are not placed before (like
+*.gif)
- Ensure that another domain is not used, like
ww2.battle.net/war3/ladder/..
> Under the 'Links' tab of the options, I have
> made sure the 'Get non-HTML files related to a link... but
> that doesn't seem to help either. Am I specifying the
filter
> rule incorrectly or something?
Doesn't seem so. Note that filters are always prioritary,
except for the "external depth" option (this option should
never been used, anyway..)
| |