| I have a unique situation, and I'm trying to figure out if httrack options
exist for doing the following...
I want to have httrack crawl a domain (site), but not save any of the
domain-local files. Instead, what I want it to do is to save all the external
assets it finds.
Use case: I have successfully mirrored a few thousand sites (they are all
customer sites). Now that I have a mirror of them, I want to locally mirror
all the assets - which are external to the site(s). The assets are all on
amazonaws.
If I wanted to mirror everything (site and assets), I would do the following:
httrack [domain.name] -%q0%c1c4C0I0R5H0K0s2z "+*.amazonaws.com/*"';
However, at this stage, I don't want to download anything from the main site
(domain.name) again... only the assets sitting on amazonaws. How do I exclude
a download of the main site, when specifying the main site to crawl?
Thank you in advance.
| |