HTTrack Website Copier
Free software offline browser - FORUM
Subject: Is it possible to spider the article content only?
Author: Andrew Seabrook
Date: 07/24/2013 20:08
 
The question is because I really don't want to download thousands and thousands
of files from microsoft just to get some library pages.
This is what I mean
<http://msdn.microsoft.com/en-us/library/aa752038(v=vs.85).aspx>
I think the download has been running so far for two hours, I am up to 7K plus
files and it doesn't even appear to have started on the content I want.
I want to retrieve the Topics pages that are mentioned in the body and
similarly at a the next level, the content in the body of the article below
that and so forth. What I am trying to say is want to retrieve the pages that
are a logical sub-set of the "Hosting and Reuse" subject matter. 

I don't to retrieve pages from every link on the whole webpage, for instance I
don't want to retrieve all the pages  the MSDN Library, menu top left!

If the solution is that I have to specify links to go or links to block - it
is no solution at all, it would be quicker for me to follow every link and
copy and paste the content. I realise this will not localise the links but
otherwise I fail to see how this product might cut it in real world use.

Any help on this or recommendations if you know of a tool more suited to my
purpose would be greatly appreciated.

Thanks
 
Reply


All articles

Subject Author Date
Is it possible to spider the article content only?

07/24/2013 20:08




3

Created with FORUM 2.0.11