HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: "Travel Mode" options not behaving as expected
Author: omega
Date: 07/27/2005 15:29
 
> Platform: version 3.33 on W2000SP4.

I'm using 3.33 as well. (On the scenario I'm about to describe, I also test it
out with 3.32 & 3.34alpha.)

> Problem: 
> Let's say I have a directory in the format
<http://www.site.com/somestuff/coolstuff/>. I give this address to HTTrack in
this exact format with the default "Can go down" option (& stay on the same
site) selected, expecting to find <http://www.site.com/somestuff/coolstuff/> and
sub-directories (such as <http://www.site.com/somestuff/coolstuff/a/>,
<http://www.site.com/somestuff/coolstuff/b/>, etc.) mirrored. They are. 
> 
> However... I'll also find that
> <http://www.site.com/somestuff/boringstuff/>
> <http://www.site.com/somestuff/uttercrap/>
> etc. mirrored to some extent.

You must have some +filters? 

Here's what I observed....

This will retrieve only the given subdir and below:

   <http://www.wdvl.com/Authoring/Graphics/Colour/>

This will retrieve only the given subdir and the particular (linked) file
specified in the filter settings:

   <http://www.wdvl.com/Authoring/Graphics/Colour/>
   filter:     +www.wdvl.com/Authoring/Graphics/Techniques/Backgrounds.html

This next relationship will result in the problem:

   <http://www.wdvl.com/Authoring/Graphics/Colour/>
   filter:
   +www.wdvl.com/Authoring/Graphics/Backgrounds.html

What I get from there is Httrack retrieves all the links it can find in that
parent Graphics/ directory. 

Apparently this is by design? But it's not as I'd have expected, and not easy
for me to understand.

For this situation, I -can- get the behavior I want via a more explicit
declaration in the filtering. 

     <http://www.wdvl.com/Authoring/Graphics/Colour/>
     filter:
     -www.wdvl.com/Authoring/Graphics/*
     +www.wdvl.com/Authoring/Graphics/Colour/*
     +www.wdvl.com/Authoring/Graphics/Backgrounds.html

What I'm having trouble with, though, is grasping why this sequence is
different from the shorter one. Why it is that adding an html file in the add
filter that is one level higher would have then altered Httrack's delineation
of foreign or higher level. 

> Any way to I prevent that? Or is that a feature?
Well, my vision is dull on this particular situation that I've brought up....

> This is really getting on my nerves when "coolstuff" & its subdirs contain
some 10MB data and "boringstuff" contains hundreds of megs... :(
> 

What I do have to offer is to note that the killfilters in Httrack are, to me,
its great power. Most often the best starting point is to start with -* as
your first line. From there, selectively add.

> FYI, I have already used search and there was a previous thread where they
suggested disabling the default filters. I had already done that.
> 
If you've not got some + filter going on in your projects, then sorry that I'd
have brought in an unrelated tangent. 

As to the defaults filters Httrack has in there, agree; particularly that
+*.gif is a bit of a desastre, as it effectively instructs to fish up
advertisements from everywhere they can be found. :)



 
Reply Create subthread


All articles

Subject Author Date
"Travel Mode" options not behaving as expected

07/27/2005 10:15
Re: "Travel Mode" options not behaving as expected

07/27/2005 15:29
Re: "Travel Mode" options not behaving as expected

07/29/2005 07:44




c

Created with FORUM 2.0.11