| Thank you Xavier,
this is a great program and only getting better as I'm
learning to use it. However, when I try to download page
(www.someweb.com/folder/file.html) I still can't get it to
work with scan rules filter
+geocities.com/*
and it doesn't work with
+*geocities.com*
either. However it works if I filter with the option
'Include link(s) -> ALL LINKS' or in other words with
+*
but then I obviously get too many files and really have no
filter at all, so I can't do that. Thus I figure that it's
not a robots.txt problem either because it picks up the
geocities.com links fine with +* which shouldn't happen if
it was a robots.txt issue (switching robots.txt rules off
doesn't help either).
I have no other filters than the default filters that are
there with HTTrack 3.30 so I don't think that could be
it...?? It just seems that the filtering doesn't understand
a link that doesn't have a leading www -symbol (or similar)
but could that really be the case?
Also, to my understanding I don't need a stricter filter
than +geocities.com/* because HTTrack will download only
those links targeting certain pages at geocities.com/. So if
there are certain number of geocities.com links on
www.someweb.com/folder/file.html it will download only those
links (+ of course sublinks since I allow it to go down).
The sites that I want to download have a lot of links to
geocities.com and I can't or don't want to type the
subfolders separately.
Finally, what do you mean by 'yourhomestead' in
+geocities.com/yourhomestead/* ?
I'll be forever grateful if you can solve this one for me...
Best regards,
Tapio
| |