HTTrack Website Copier
Free software offline browser - FORUM
Subject: How to include embedded Disqus content?
Author: cherokee
Date: 12/22/2014 16:17
 
Hello!

Wondering which options I'd need to configure in order to download a website
including the comments coming from the "disqus" service.
Currently I get the main content of the website, but it's missing the "disqus"
content.

The used command is/are:
httrack
<http://mac.appstorm.net/quick-look/life-a-new-option-for-journal-keepers/> -n
-c8 -R3 -v -O ./mac/ "+disqus.com/*" "+*.disqus.com/*"
"+macappstorm.disqus.com/*" "+tempest.services.disqus.com/*"
"+disquscdn.com/*" "+referrer.disqus.com/*" "+*.disquscdn.com/*"
"+a.disquscdn.com/*" "+envato.s3.amazonaws.com/*"
httrack
<http://mac.appstorm.net/quick-look/life-a-new-option-for-journal-keepers/> -%P
-n -c8 -R3 -v -O ./mac/ "+disqus.com/*" "+*.disqus.com/*"
"+macappstorm.disqus.com/*" "+tempest.services.disqus.com/*"
"+disquscdn.com/*" "+referrer.disqus.com/*" "+*.disquscdn.com/*"
"+a.disquscdn.com/*" "+envato.s3.amazonaws.com/*"

Having inspected via Firebug the calls, I came up with adding + filters with
the observed domains. But it didn't help yet.

The website testing on is:
<http://mac.appstorm.net/quick-look/life-a-new-option-for-journal-keepers/>

The Disqus content seems being loaded asynchronously/lazy-loaded in an own
frame. 
The "frame's call" inspected is:
disqus.com/embed/comments/?base=default&disqus_version=a4d38e7c&f=macappstorm&t_i=64848%20http%3A%2F%2Fmac.appstorm.net%2F%3Fp%3D64848&t_u=http%3A%2F%2Fmac.appstorm.net%2Fquick-look%2Flife-a-new-option-for-journal-keepers%2F&t_e=Life%3A%20A%20New%20Option%20for%20Journal-Keepers&t_d=Life%3A%20A%20New%20Option%20for%20Journal-Keepers&t_t=Life%3A%20A%20New%20Option%20for%20Journal-Keepers&s_o=default&l=#2

I'm not familiar how Disqus works here, but looking at the source it seems the
actual URL gets assembled on the fly.

(function() {
    var dsq = document.createElement('script'); dsq.type = 'text/javascript';
    dsq.async = true;
    dsq.src = '//' + disqus_shortname + '.' + 'disqus.com' +
'/embed.js?pname=wordpress&pver=2.74';
    (document.getElementsByTagName('head')[0] ||
document.getElementsByTagName('body')[0]).appendChild(dsq);
})();

I've browsed in this forum the following thread:
<http://forum.httrack.com/readmsg/26627/26620/index.html?q=javascript>

Is this the same situation here?On-the-fly invoked JavaScript-based
URL-assembled request gets is not feasible for HTTRACK?

Thanks for either confirming that it is not possible to fetch that sort of
embedded Disqus comments,
and even more thanks if there is a way with "correct" options to get it
downloaded!

Merry XMAS
cherokee
 
Reply


All articles

Subject Author Date
How to include embedded Disqus content?

12/22/2014 16:17




f

Created with FORUM 2.0.11