| Xavier - Thanks for your reply.
You said: "you should also allow html pages where the files are found"
How do I do this? I tried the following but got the same results as in my
first post below:
-* +www.example.net/*.html +*.html +*.pdf
I'm sure HTTrack will do exactly what I want (and save me a great deal of
time) if I can get this right.
Thanks,
Papoulka
============================
If you do not download html files where links are referenced, you won't catch
anything ; ie. you should also allow html pages where the files are found.
============================
I want to download all .PDF files offered (for free) by a certain website. I
don't need anything else. So I have used as a scan rule:
-* +www.example.net/*.html +*.pdf
I also used "not follow robots.txt rules" because I think that was an initial
problem.
However, I still get only the site's home page and the single PDF offered on
it, with no other content.
| |