HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Repeated scanning same page.
Author: Leto
Date: 04/12/2002 01:29
 
> How about the trick filter 
> -www.foo.com/bar/homepage.htm*
> I found that / and  make big difference, when on PC, 
> they are same, but Httrack only take /.
> If I include '*' at the end, only one page download.  
> If I don't include '*', it still goes in loops.
> 
> Is this expected behavior?
"/" is the correct character for URLs.  Windows/IE tries to be smart that when
you type in "\" it automatically converts them to "/".

The "*" is vital to making the filter work.  The above filter says, exclude
the page 'homepage.htm' and ANYTHING which comes after it in the URL -- the
querystring.

If the filter was "-www.server.com/homepage.htm" without the "*", then if a
URL pointed to that with a querystring it would be downloaded.

Of course, this may be a problem if all pages in the site use the same page,
eg.
www.server.com/page.nsf?page=home
www.server.com/page.nsf?page=page&id=12345

In this case you will be excluding all linked pages :(
 
Reply Create subthread


All articles

Subject Author Date
Repeated scanning same page.

04/10/2002 18:25
Re: Repeated scanning same page.

04/10/2002 19:01
Re: Repeated scanning same page.

04/10/2002 20:02
Re: Repeated scanning same page.

04/10/2002 20:57
Re: Repeated scanning same page.

04/10/2002 21:10
Re: Repeated scanning same page.

04/12/2002 01:29




5

Created with FORUM 2.0.11