HTTrack Website Copier
Free software offline browser - FORUM
Subject: Trying to capture/archive subreddit only
Author: sebastian soleil
Date: 11/03/2017 01:43
 
OK I am trying to capture a particular subreddit and  archive it's growth every
month.

So an example. I want to capture www.reddit.com/r/anything and I also want it
to go 1 in depth for external links so as to capture a snapshot of whatever is
referenced.

Since each subreddits front page contains links to popular subreddits such as
r/funny, httrack ends up trying to 
capture all of reddit... I want to limit those paths to 1 in depth and pages
external to reddit also to 1.

I tried all the obvious settings that make sense, haven't solved it yet.

Also the subreddit is private. I have to temporarily change it to public to
archive and I don't like that. The login feature in httrack doesn't work.

I've been messing with the settings A LOT and have had some limited success. I
can't get a download that is even remotely useful.

I even changed the internal depth to 2 and external depth to 0. What I get is
links is an index page that doesn't work, when i go to what they are supposed
to point to, they don't work. When I actually drill down into folders for
content, I get a forum front page that is a wall of text with no css at all.
Just words completely stripped of html and css.
 
Reply


All articles

Subject Author Date
Trying to capture/archive subreddit only

11/03/2017 01:43




0

Created with FORUM 2.0.11