HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to mirror a twitter account's timeline?
Author: Nijaz
Date: 08/25/2020 20:59
 
> Hi Nijaz, thanks for your hints! Always good to
> learn something new :)
> 
> I'm testing it (via command line) with a very small
> Twitter account (only a few hundred tweets),
> however, the process seems to take way too long and
> it has not finished yet (and it is really a small
> Twitter account). 
> 
> Have you been successful to use WinHTTrack on the
> Mozilla account you mentioned? How long did it take
> to finish?> 
> I can see artifacts in the working directory, which
> seem reasonable, however, there is no .html file yet
> (apart from a single index.html, which just captured
> the generic Twitter login page, so not even the
> same, very first page of the timeline).
> 
> How did you find out about this: 
> "with those ending max_id in url being each seperate
> page"? I can't see yet something like this
> (naming-wise) in my working directory.
> 
> I also realized, when I open the mobile Twitter
> version on a desktop PC browser, it will still issue
> dynamic requests to Twitter upon scroll downs in
> order to load the next Tweets. But that might has to
> do with the user agent. So for httrack
> configuration, I looked up a mobile browser user
> agent. Though have not tested to simulate a mobile
> user agent via a desktop browser to check if still
> dynamic requests are generated (..but somehow I
> would expect them though..).
> 
> However, the process takes for my feeling way too
> long.
> 
> The command is this: 
> 
> httrack "PUT_HERE_TWITTER_URL" -v -s0 -F
> "Mozilla/5.0 (Linux; U; Android 2.2)
> AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0
> Mobile Safari/533.1" -%S "twitter-scanrules.txt" 
> -%c16 -%B -u0 -%s -%v -A0 --disable-security-limits
> -i
> 
> The referenced twitter-scanrules.txt file contains:
> 
> -*.js
> -*.js*
> -*PUT_HERE_TWITTER_URL/*
> +*.css
> +*.png
> +*.jpeg
> +*.webp
> 
> Would be great to hear from you if you have been
> successful with your settings, maybe you can look
> for a very small Twitter account to test and see if
> the process finishes in a timely fashion plus if the
> result is ok.
> 
> Thanks

> Hi Nijaz, thanks for your hints! Always good to
> learn something new :)
> 
> I'm testing it (via command line) with a very small
> Twitter account (only a few hundred tweets),
> however, the process seems to take way too long and
> it has not finished yet (and it is really a small
> Twitter account). 
> 
> Have you been successful to use WinHTTrack on the
> Mozilla account you mentioned? How long did it take
> to finish?> 
> I can see artifacts in the working directory, which
> seem reasonable, however, there is no .html file yet
> (apart from a single index.html, which just captured
> the generic Twitter login page, so not even the
> same, very first page of the timeline).
> 
> How did you find out about this: 
> "with those ending max_id in url being each seperate
> page"? I can't see yet something like this
> (naming-wise) in my working directory.
> 
> I also realized, when I open the mobile Twitter
> version on a desktop PC browser, it will still issue
> dynamic requests to Twitter upon scroll downs in
> order to load the next Tweets. But that might has to
> do with the user agent. So for httrack
> configuration, I looked up a mobile browser user
> agent. Though have not tested to simulate a mobile
> user agent via a desktop browser to check if still
> dynamic requests are generated (..but somehow I
> would expect them though..).
> 
> However, the process takes for my feeling way too
> long.
> 
> The command is this: 
> 
> httrack "PUT_HERE_TWITTER_URL" -v -s0 -F
> "Mozilla/5.0 (Linux; U; Android 2.2)
> AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0
> Mobile Safari/533.1" -%S "twitter-scanrules.txt" 
> -%c16 -%B -u0 -%s -%v -A0 --disable-security-limits
> -i
> 
> The referenced twitter-scanrules.txt file contains:
> 
> -*.js
> -*.js*
> -*PUT_HERE_TWITTER_URL/*
> +*.css
> +*.png
> +*.jpeg
> +*.webp
> 
> Would be great to hear from you if you have been
> successful with your settings, maybe you can look
> for a very small Twitter account to test and see if
> the process finishes in a timely fashion plus if the
> result is ok.
> 
> Thanks

> Hi Nijaz, thanks for your hints! Always good to
> learn something new :)
> 
> I'm testing it (via command line) with a very small
> Twitter account (only a few hundred tweets),
> however, the process seems to take way too long and
> it has not finished yet (and it is really a small
> Twitter account). 
> 
> Have you been successful to use WinHTTrack on the
> Mozilla account you mentioned? How long did it take
> to finish?> 
> I can see artifacts in the working directory, which
> seem reasonable, however, there is no .html file yet
> (apart from a single index.html, which just captured
> the generic Twitter login page, so not even the
> same, very first page of the timeline).
> 
> How did you find out about this: 
> "with those ending max_id in url being each seperate
> page"? I can't see yet something like this
> (naming-wise) in my working directory.
> 
> I also realized, when I open the mobile Twitter
> version on a desktop PC browser, it will still issue
> dynamic requests to Twitter upon scroll downs in
> order to load the next Tweets. But that might has to
> do with the user agent. So for httrack
> configuration, I looked up a mobile browser user
> agent. Though have not tested to simulate a mobile
> user agent via a desktop browser to check if still
> dynamic requests are generated (..but somehow I
> would expect them though..).
> 
> However, the process takes for my feeling way too
> long.
> 
> The command is this: 
> 
> httrack "PUT_HERE_TWITTER_URL" -v -s0 -F
> "Mozilla/5.0 (Linux; U; Android 2.2)
> AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0
> Mobile Safari/533.1" -%S "twitter-scanrules.txt" 
> -%c16 -%B -u0 -%s -%v -A0 --disable-security-limits
> -i
> 
> The referenced twitter-scanrules.txt file contains:
> 
> -*.js
> -*.js*
> -*PUT_HERE_TWITTER_URL/*
> +*.css
> +*.png
> +*.jpeg
> +*.webp
> 
> Would be great to hear from you if you have been
> successful with your settings, maybe you can look
> for a very small Twitter account to test and see if
> the process finishes in a timely fashion plus if the
> result is ok.
> 
> Thanks

I am sorry, I did not test it until now, I was just giving closest to correct
answer.
Now I tested with twitter account guardianproject.
These are correct scan rules if you don't need images:
-*
+mobile.twitter.com/guardianproject
+mobile.twitter.com/guardianproject?max_id=*
+*.css

Of course, replace word guardianproject with any username you need. If you
want images too then add two additional lines below scan rules (after css
rule):
+*[name].twimg.com/*
-*.js

I figured that url max_id by opening url of any twitter account in browser and
putting mouse cursor over 'load more' button on mobile twitter page, and in
status bar you'll see that is what those buttons are.

I have internet speed of 2G mobile internet, and here is my log how much it
took me to download guardianproject and what rules I used, from the following
files:

doit.log:
-qiC1%Ps2u1%s%uN0%I0p3DaK0c8H0%kf2A250000%f#f -F "Mozilla/4.5 (compatible;
HTTrack 3.0x; Windows 98)" -%F "" -%l "en, *"
<https://mobile.twitter.com/guardianproject> -O1 "D:\MyWebSites\twitter" -*
+mobile.twitter.com/guardianproject
+mobile.twitter.com/guardianproject?max_id=* +*.css
File generated automatically on Tue, 25 Aug 2020 20:34:39, do NOT edit

hts-log.txt:
HTTrack3.49-2+htsswf+htsjava launched on Tue, 25 Aug 2020 20:34:39 at
<https://mobile.twitter.com/guardianproject> -*
+mobile.twitter.com/guardianproject
+mobile.twitter.com/guardianproject?max_id=* +*.css

(winhttrack -qiC1%Ps2u1%s%uN0%I0p3DaK0c8H0%kf2A250000%f#f -F "Mozilla/4.5
(compatible; HTTrack 3.0x; Windows 98)" -%F  -%l "en, *"
<https://mobile.twitter.com/guardianproject> -O1 "D:\MyWebSites\twitter" -*
+mobile.twitter.com/guardianproject
+mobile.twitter.com/guardianproject?max_id=* +*.css )



Information, Warnings and Errors reported for this mirror:

note:	the hts-log.txt file, and hts-cache folder, may contain sensitive
information,

	such as username/password authentication for websites mirrored in this
project

	do not share these files/folders if you want these information to remain
private





HTTrack Website Copier/3.49-2 mirror complete in 11 minutes 39 seconds : 166
links scanned, 165 files written (10479775 bytes overall), 164 files updated
[1670369 bytes received at 2389 bytes/sec], 10431952 bytes transferred using
HTTP compression in 164 files, ratio 12%

(No errors, 0 warnings, 0 messages)

 
Reply Create subthread


All articles

Subject Author Date
How to mirror a twitter account's timeline?

08/18/2020 14:58
Re: How to mirror a twitter account's timeline?

08/18/2020 18:52
Re: How to mirror a twitter account's timeline?

08/21/2020 00:48
Re: How to mirror a twitter account's timeline?

08/21/2020 00:51
Re: How to mirror a twitter account's timeline?

08/21/2020 00:52
Re: How to mirror a twitter account's timeline?

08/21/2020 09:41
Re: How to mirror a twitter account's timeline?

08/25/2020 20:59
Re: How to mirror a twitter account's timeline?

08/25/2020 21:01
Re: How to mirror a twitter account's timeline?

08/25/2020 21:28
Re: How to mirror a twitter account's timeline?

08/25/2020 22:39
Re: How to mirror a twitter account's timeline?

08/25/2020 23:27
Re: How to mirror a twitter account's timeline?

08/26/2020 00:58




3

Created with FORUM 2.0.11