| OK I am trying to capture a particular subreddit and archive it's growth every
month.
So an example. I want to capture www.reddit.com/r/anything and I also want it
to go 1 in depth for external links so as to capture a snapshot of whatever is
referenced.
Since each subreddits front page contains links to popular subreddits such as
r/funny, httrack ends up trying to
capture all of reddit... I want to limit those paths to 1 in depth and pages
external to reddit also to 1.
I tried all the obvious settings that make sense, haven't solved it yet.
Also the subreddit is private. I have to temporarily change it to public to
archive and I don't like that. The login feature in httrack doesn't work.
I've been messing with the settings A LOT and have had some limited success. I
can't get a download that is even remotely useful.
I even changed the internal depth to 2 and external depth to 0. What I get is
links is an index page that doesn't work, when i go to what they are supposed
to point to, they don't work. When I actually drill down into folders for
content, I get a forum front page that is a wall of text with no css at all.
Just words completely stripped of html and css. | |