how to capture only html hrefs - HTTrack Website Copier Forum

Subject: how to capture only html hrefs

Author: Pollard

Date: 02/04/2017 20:29

The site I'm working with has thousands of links and they're categorized, with
most links fitting into more than one category. The categories make it so that
the links are at several different directory levels. It would take a week to
go through them all by clicking away on the site. Half of the links are
dead/broken hence my wanting to use httrack. 
 
Not sure if it's possible but what I'm wanting to do is just download html
links from a site's html pages. Is that possible? I'm on ubuntu so using the
terminal is an option I guess.

If not, I've got it set to only retrieve html and I can sift through it
later.

Next; I have it set to stay on the root domain yet it's making lots of folders
for other domains. Does stay on root domain not function the way I'm expecting
it to?
Since I'm only capturing html, it's not a big deal as far as size and it seems
to be only putting one html file in each folder, mostly index_dot_html. Just
curious as to the cause.

Thanks

All articles

Subject	Author	Date
how to capture only html hrefs		02/04/2017 20:29