| Thanks Nijaz for the updated filter rules!
With these I was also able now to "HTTrack down" the guardianproject in a
short amount of time.
JFYI here the used command line version:
httrack <https://mobile.twitter.com/guardianproject> -v -s0 -F "Mozilla/5.0
(Linux; U; Android 2.2) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0
Mobile Safari/533.1" -%S "twitter2-scanrules.txt" -%c16 -%B -u0 -%s -%v -A0
--disable-security-limits
(in the scanrules txt file are the rules as defined by you)
Tried it also on the other Twitter account I was using in my first test, and
it also was quickly finishing.
For some embedded images it may need some further scan rules tweaking, but
overall it's great to see that basically working!
I just noticed recently, at least with these 2 tests, the created HTML files
are stored without file suffix. I have to add .html and then they are viewable
(else it would be the plain source html text in the browser). And because of
the missing file suffix, also the reference used by the "Load Older Tweets"
button would require the .html change in each file, else right now it
references out to the "file without suffix".
Any idea why the files (and hence references in these files) are not having
any file suffix, where I would expect it to be .html for example? | |