Re: repeated grabs of same file + filter doesn't work

Subject: Re: repeated grabs of same file + filter doesn't work

Author: Xavier Roche

Date: 12/28/2002 17:27

> Problem 1 - the program seem to be grabbing multiple
> instances of certain web pages from the above website,
> namely privacy.shtml, tou.shtml, legalfaq.shtml and some
> other web pages as well.

Different URLS? Like ..privacy.shtml?id=12 
and ..privacy.shtml?id=34 ?
Look in hts-cache/new.txt for refferences to privacy.shtml -
 the URLS should be different?
> I have noticed that these web pages
> are pretty much referenced from the bottom of almost every
> page in the website. This is causing extreme delays in the
> full download of the website

Darn.. try avoiding these files (options/scan rules : -
*privacy.shtml* -*tou.shtml* -*legalfaq.shtml*), but I'm 
still surprised that this problem can occur

> Problem 2 - Under the 'Scan Rules' of the program, I have
> specified for the program to avoid downloading anything in
> the 'ladder' folder by using the following scan rule:
> -www.battle.net/war3/ladder/*
> However, the program is still downloading the contents of
> the website.

Are you still getting links from 
www.battle.net/war3/ladder/ ?? 

- Ensure that -www.battle.net/war3/ladder/* is the last 
filter, and that other filters are not placed before (like 
+*.gif)

- Ensure that another domain is not used, like 
ww2.battle.net/war3/ladder/..

> Under the 'Links' tab of the options, I have
> made sure the 'Get non-HTML files related to a link... but
> that doesn't seem to help either. Am I specifying the 
filter
> rule incorrectly or something?
Doesn't seem so. Note that filters are always prioritary, 
except for the "external depth" option (this option should 
never been used, anyway..)

Create subthread

All articles

Subject	Author	Date
repeated grabs of same file + filter doesn't work		12/24/2002 06:44
Re: repeated grabs of same file + filter doesn't work		12/28/2002 17:27
Re: repeated grabs of same file + filter doesn't w		01/03/2003 22:31