| Hi Nijaz, thanks for your hints! Always good to learn something new :)
I'm testing it (via command line) with a very small Twitter account (only a
few hundred tweets), however, the process seems to take way too long and it
has not finished yet (and it is really a small Twitter account).
Have you been successful to use WinHTTrack on the Mozilla account you
mentioned? How long did it take to finish?
I can see artifacts in the working directory, which seem reasonable, however,
there is no .html file yet (apart from a single index.html, which just
captured the generic Twitter login page, so not even the same, very first page
of the timeline).
How did you find out about this:
"with those ending max_id in url being each seperate page"? I can't see yet
something like this (naming-wise) in my working directory.
I also realized, when I open the mobile Twitter version on a desktop PC
browser, it will still issue dynamic requests to Twitter upon scroll downs in
order to load the next Tweets. But that might has to do with the user agent.
So for httrack configuration, I looked up a mobile browser user agent. Though
have not tested to simulate a mobile user agent via a desktop browser to check
if still dynamic requests are generated (..but somehow I would expect them
though..).
However, the process takes for my feeling way too long.
The command is this:
httrack "PUT_HERE_TWITTER_URL" -v -s0 -F "Mozilla/5.0 (Linux; U; Android 2.2)
AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" -%S
"twitter-scanrules.txt" -%c16 -%B -u0 -%s -%v -A0 --disable-security-limits
-i
The referenced twitter-scanrules.txt file contains:
-*.js
-*.js*
-*PUT_HERE_TWITTER_URL/*
+*.css
+*.png
+*.jpeg
+*.webp
Would be great to hear from you if you have been successful with your
settings, maybe you can look for a very small Twitter account to test and see
if the process finishes in a timely fashion plus if the result is ok.
Thanks | |