| Very often I'm unable to download pages pointed by CGI
links because they are just "ignored" by HTTrack
parser. For example:
<http://www.webshots.com/photos/thegreatoutdoors.html>
In this page there is a CGI link with text "Download
entire photo collection" which is totally ignored by
HTTrack, even with +* rule. Why?
(note that you have to be registered to actually
access the CGI link. This doesn't seem a problem: if I
specify the CGI link directly as the starting point of
the project, HTTrack downloads it without problems by
passing the appropriate cookie)
P.S. My purpose is to download every .exe file
associated to a Webshot gallery. So I set the starting
point to:
<http://www.webshots.com/explore/gallery.html>
And the filtering rules to:
-* +*.html +*.exe* +*/scripts/*
If those CGI link were parsed correctly, I think this
should be correct. Unfortunately, they weren't. | |