Re: Just crawl, don't download? - HTTrack Website Copier Forum

Subject: Re: Just crawl, don't download?

Author: David Strom

Date: 03/19/2012 20:38

> > I would like to just crawl the website, not
> download
> > everything, but I don't see how to do this.  Can
> > anyone advise, please?  TIA.
> 
> What do you mean by crawling ? Only html (/php etc.)
> pages ?> 
> Use something like 
> Scan rules =>
> -* +www.yoursite.com/*.html +www.yoursite.com/*.php
> +www.yoursite.com/*.asp +www.yoursite.com/*/
> 


Like that, but not exactly on our website.  A quick look shows me a number of
links that go to new pages that don't have any extension, like
www.oursite.com/news or www.oursite.com/resources.  If I want to use htttrack,
I may have to develop a lot of exclusions: pdf, zip, MS-Office app extensions,
Images (jpg/tiff/map/geotiff), ESRI data files, etc.  I notice while watching
the crawl session (-spider option) that a good number of files were taking
awhile to download.

Create subthread

All articles

Subject	Author	Date
Just crawl, don't download?		03/19/2012 19:40
Re: Just crawl, don't download?		03/19/2012 19:48
Re: Just crawl, don't download?		03/19/2012 20:38
Re: Just crawl, don't download?		03/20/2012 00:24