Grabbing External Files Only - HTTrack Website Copier Forum

Subject: Grabbing External Files Only

Author: Lewis De Payne

Date: 04/18/2016 02:54

I have a unique situation, and I'm trying to figure out if httrack options
exist for doing the following...

I want to have httrack crawl a domain (site), but not save any of the
domain-local files.  Instead, what I want it to do is to save all the external
assets it finds.

Use case: I have successfully mirrored a few thousand sites (they are all
customer sites).  Now that I have a mirror of them, I want to locally mirror
all the assets - which are external to the site(s).  The assets are all on
amazonaws.

If I wanted to mirror everything (site and assets), I would do the following:

httrack [domain.name] -%q0%c1c4C0I0R5H0K0s2z "+*.amazonaws.com/*"';

However, at this stage, I don't want to download anything from the main site
(domain.name) again... only the assets sitting on amazonaws.  How do I exclude
a download of the main site, when specifying the main site to crawl?
Thank you in advance.

All articles

Subject	Author	Date
Grabbing External Files Only		04/18/2016 02:54