| I have been trying for hours to download a certain egroup,
let us call it mygroup (thats not its name).
My problem is that if I simply tell it to download
<http://groups.yahoo.com/group/mygroup>
It runs find and dandy, but it spends all of its time
downloading everything in
<http://groups.yahoo.com/group/mygroup/messages>
which is simply the index pages of everything in
<http://groups.yahoo.com/group/mygroup/message>
but what I REALLY want is to ignore everything in
<http://groups.yahoo.com/group/mygroup/messages>
and simply download all 1100 posts in
<http://groups.yahoo.com/group/mygroup/message>
I realize that all the html in /message that I want is named
<http://groups.yahoo.com/group/mygroup/message/1>
<http://groups.yahoo.com/group/mygroup/message/2>
etc., so one of my experiments was to create a text file
containing 100 urls
<http://groups.yahoo.com/group/mygroup/message/101>
<http://groups.yahoo.com/group/mygroup/message/102>
<http://groups.yahoo.com/group/mygroup/message/103>
<http://groups.yahoo.com/group/mygroup/message/104>
etc
that experiment seemed to be the most successful, but still
it grabbed all the damned index pages in /messages
I am currently experimenting with a download which does not
use a text file, but which uses INCLUDE everything in folder
groups.yahoo.com/group/mygroup/message
and EXCLUDE everything in folder
groups.yahoo.com/group/mygroup/messages
also specifying GET HTML FIRST, and disabling the option to
explore all links even unknown ones...
THIS experiment has been running for 30 minutes, without
writing one single file, which does not look good, though
the log claims that it is writing files??
bottom line, i would love to grap all 1100 htmp in /message
in under 6 hours, and omit the useless index pages, and if
possible even omit photos, jpgs, adds, since all i care
about is having readable text from each html post.....
If you CAN help me, please be step by step explicit with
exactly which parameters i need and the exact spelling,
since i am not a programmer....
thanks a million if you can help...
also, i dont understand the options about external levels
and internal levels, but I suspect that
| |