HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Restricting harvests to certain file types
Author: Leto
Date: 01/06/2003 22:20
 
> Hello, I want to restrict Win HTTrack to only download PDF, 
> DOC and XLS files from a number of websites. I've been 
> through past forum discussions and have been experimenting 
> with scan rules such as -* +*.htm +*.html +*.asp +*.php 
> +*.pdf +*.doc +.xls However, I can't seem to exclude HTML 
> or HTM files. I wish to restrict my impact on host servers 
> (I have limits set) and only download PDF, DOC and XLS 
> files. When I try to exclude HTML and HTM files I get 
> nothing. Can I only download these files and exclude at 
> HTML? Does anybody have any scan rule suggestions? Many 
> thanks, dnt

G'day dnt -- it's been a while ;)

If all the files you want to download (PDF, DOC, etc) are on a single page,
then filters like

-* +*.pdf +*.doc +*.xls

would work because you are telling HTTrack to not go anywhere past your
starting URL but only to those filetypes.

But when the website is multiple pages, you DO need to allow HTTrack to follow
on to those other pages.  So if the pages are HTML, you add +*.htm +*.html

There is no way around this -- HTTrack needs to go to the pages to find all
the files you want.

One feature you could use, though, is build structure.  You could tell the
program to put HTML files in one folder and everything else in another folder. 
When the capture is complete, the files will be nicely separated.
 
Reply Create subthread


All articles

Subject Author Date
Restricting harvests to certain file types

01/05/2003 23:54
Re: Restricting harvests to certain file types

01/06/2003 22:20
Re: Restricting harvests to certain file types

01/06/2003 22:42
Re: Restricting harvests to certain file types

01/07/2003 21:14
Re: Restricting harvests to certain file types

01/07/2003 21:48
Re: Restricting harvests to certain file types

01/09/2003 09:54




0

Created with FORUM 2.0.11