HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: restrict harvest to certain file types
Author: Xavier Roche
Date: 01/14/2003 07:28
 
> I'm having trouble limiting file types
> that are harvested.  I'm interested in
> downloading only .jpeg and .gif files.
> I typically will enter the URL's to,
> for conversations sake, 10 sites dedicated to
> animal photography. I'm looking to mass
> download as many pictures as possible. I 
> have no need for ANY other file types other
> than JPEG AND GIF! What would the command line look
> like for this? Also, I have a very fast machine and
> a high bandwith connection.

Uh. First, even if you have fast bandwidth, beware NOT to 
overload the remote server and NOT to clobber the bandwidth 
of other users ; I suggest you use Options/Limits/Maximum 
transfer rate and maximum number of connections.

Then, you have to get html content to detect links ; if 
html pages are htm or html, use:

-* +*.gif +www.yoursite.com/*.htm +www.yoursite.com/*.html

Of course if you have more that one site, you'll have to 
add the proper filters for each sites, such as:
-* +*.gif +www.yoursite.com/*.htm +www.yoursite.com/*.html 
+www.yoursite2.com/*.htm +www.yoursite2.com/*.html ...

But, again: limit your bandwidth, and, if possible, do that 
during non-working hours
 
Reply Create subthread


All articles

Subject Author Date
restrict harvest to certain file types

01/14/2003 05:57
Re: restrict harvest to certain file types

01/14/2003 07:28




2

Created with FORUM 2.0.11