HTTrack Website Copier
Free software offline browser - FORUM
Subject: Only grab files from the servers given in URL-List
Author: qwertz
Date: 12/12/2004 18:41
 
Hi!

Sorry, i asked this some weeks ago here, but i did not get a 
single answer... maybe the posting has been overlooked :-) .
..

And i think that the suggestion given at the end of this 
text is not such a bad idea, as it would solve all the 
issues where people now have to prevent other servers with 
filters... :-)


I have to download several thousands of files with some 
specific filetypes (five different extensions) to which is 
linked from unknown html-files on a regular basis from about 
100 homepages (which i have to change often) with a lot of 
URLs in subdirectories each.

These URLs i have in a file url_list.txt , which is 
automatically generated and which i give winhttrack at "URL 
list (.txt):".

Now there is for example a URL www.geocities.com/blabla/
index.html one of these 100 URLs.

I want to download ONLY www.geocities.com/blabla/* and all 
files with given extensions in its subdirecories, but not on 
any other sites. So for example not from www.geocities.com/
dontdownload/* , even, if a link in www.geocities.com/
blabla/index.html points to it.

In the FAQ and in the forum you tell the people to use 
filters for such reasons. Well, if it were only a few URLs, 
no problem. But with so many URLs? almost impossible.

You have at least to put all these 100 URLs (which change 
often) also into the filters.
And what to do for the different extensions?
so, i come to the realization that i have to put the 
following into the filter for any of the 100 URLs...:

-*
...
+*www.geocities.com/blabla/*.extension1
+*www.geocities.com/blabla/*.extension2
+*www.geocities.com/blabla/*.extension3
+*www.geocities.com/blabla/*.extension4
+*www.geocities.com/blabla/*.extension5
+*www.geocities.com/blabla/*.htm
+*www.geocities.com/blabla/*.html
+*www.geocities.com/blabla/*.shtml
etc....

Is there any way to do this without making a monster-filter 
with 800+ entries? (one for each url, multiplied with the 
number of different filetypes...). (and i cant generate such 
filters automatized...).

(or maybe 100, if there is a way to "or" some extensions, 
for example:
+*www.geocities.com/blabla/*.[extension1,extension2,
extension3,...] , which didnt work for me because i also 
have to specify limits for filesizes, extension2[>10] ).

You know offline explorer from <http://www.metaproducts.com/> 
? This program has very simple options for this ("URL 
filters"):
<http://www.metaproducts.com/mp/images/screen/mpOEV-13.gif> , 
<http://www.metaproducts.com/mp/images/screen/mpOEV-15.gif> , 
and the same with directories.

How would be some options you can give to the URLs in the 
url_list.txt (or in the field "Web Addresses: (URL)" in the 
GUI) to each line to prevent from leaving directories?
for example: URL-Lists like

<http://www.geocities.com/blablubb/>
<http://www.geocities.com/blabla/> [onlysubdirs]
<http://www.geocities.com/blablemm/> [onlysameserver]
etc...

and in the filter, i can then set simple static values for 
all cases:

-*
+*/*.extension1
+*/*.extension2[>10]
+*/*.extension3[<20]
+*/*.extension4
+*/*.extension5
+*/*.html
+*/*.shtml
+*/*.htm


(or am i missing something, and this is possible in an easy 
way? :-) ).


Thanks in advance, bye :-)
 
Reply


All articles

Subject Author Date
Only grab files from the servers given in URL-List

12/12/2004 18:41
Re: Only grab files from the servers given in URL-List

12/18/2004 18:58
Re: Only grab files from the servers given in URL-List

12/19/2004 03:13




f

Created with FORUM 2.0.11