| Hi!
I have to download several thousands of files with some
specific filetypes (five different extensions) to which is
linked from unknown html-files on a regular basis from about
100 homepages (which i have to change often) with a lot of
URLs in subdirectories each.
These URLs i have in a file url_list.txt , which is
automatically generated and which i give winhttrack at "URL
list (.txt):".
Now there is for example a URL www.geocities.com/blabla/
index.html one of these 100 URLs.
I want to download ONLY www.geocities.com/blabla/* and all
files with given extensions in its subdirecories, but not on
any other sites. So for example not from www.geocities.com/
dontdownload/* , even, if a link in www.geocities.com/
blabla/index.html points to it.
In the FAQ and in the forum you tell the people to use
filters for such reasons. Well, if it were only a few URLs,
no problem. But with so many URLs? almost impossible.
You have at least to put all these 100 URLs (which change
often) also into the filters.
And what to do for the different extensions?
so, i come to the realization that i have to put the
following into the filter for any of the 100 URLs...:
-*
...
+*www.geocities.com/blabla/*.extension1
+*www.geocities.com/blabla/*.extension2
+*www.geocities.com/blabla/*.extension3
+*www.geocities.com/blabla/*.extension4
+*www.geocities.com/blabla/*.extension5
+*www.geocities.com/blabla/*.htm
+*www.geocities.com/blabla/*.html
+*www.geocities.com/blabla/*.shtml
etc....
Is there any way to do this without making a monster-filter
with 800+ entries? (one for each url, multiplied with the
number of different filetypes...). (and i cant generate such
filters automatized...).
(or maybe 100, if there is a way to "or" some extensions,
for example:
+*www.geocities.com/blabla/*.[extension1,extension2,
extension3,...] , which didnt work for me because i also
have to specify limits for filesizes, extension2[>10] ).
You know offline explorer from <http://www.metaproducts.com/>
? This program has very simple options for this ("URL
filters"):
<http://www.metaproducts.com/mp/images/screen/mpOEV-13.gif> ,
<http://www.metaproducts.com/mp/images/screen/mpOEV-15.gif> ,
and the same with directories.
How would be some options you can give to the URLs in the
url_list.txt (or in the field "Web Addresses: (URL)" in the
GUI) to each line to prevent from leaving directories?
for example: URL-Lists like
<http://www.geocities.com/blablubb/>
<http://www.geocities.com/blabla/> [onlysubdirs]
<http://www.geocities.com/blablemm/> [onlysameserver]
etc...
and in the filter, i can then set simple static values for
all cases:
-*
+*/*.extension1
+*/*.extension2[>10]
+*/*.extension3[<20]
+*/*.extension4
+*/*.extension5
+*/*.html
+*/*.shtml
+*/*.htm
(or am i missing something, and this is possible in an easy
way? :-) ).
Thanks in advance, bye :-)
| |