| Hi, I've tried some hours now to copy the following thread:
<http://siliconinvestor.advfn.com/readmsg.aspx?msgid=2862899>
but it doesn't work out.
I want to copy the message above together with all the following messages in
that thread that can be reached by pressing "next" on the page.
Thus, in the thread, the next message can be reached by pressing the "next"
link. The link has an address on the format:
<http://siliconinvestor.advfn.com/go.aspx?subjectid=18363&msgnum=1>
And the next message in the thread has an identical "next" link except msgnum
is changed to =2 and so forth until 4096 which is the last message.
When pressing the "next" link a new html page is returned on the first format
above: ...aspx?msgid=xxxxx
So, basically, the "next" link provides a mapping from some
...?subjectid=18363&msgnum=yyyy to some msgid=xxxx.
Now, I've tried setting the site to copy to be the first message
(http://siliconinvestor.advfn.com/readmsg.aspx?msgid=2862899) and then tried
-* +*subjectid=18363&msgnum=*, but httrack doesn't track down in the message
hierarchy.
I also tried to play around with other settings such as masking out
robots.txt, changing browser by taking away the default "httrack"
identification. And some other filtering basically trying to make httrack to
follow the next link and ignore all other links (I just want the text of the
messages).
Can anybody help out? How should it be done?
Greatful for any help!
Thanks/
Sven
| |