Re: Command Line Spidering - HTTrack Website Copier Forum

Subject: Re: Command Line Spidering

Author: rex

Date: 03/26/2008 13:18

> >  httrack
> <http://www.websitetospider.com/page.cfm>
> > -O "./spidereddata" "-*" "+.cfm" "+.htm" "+-html"
> > "+*websitetospider.com/listings.cfm/listing/*" 
> -r6
> 
> missing asterisks on the cfm/htm..
True. Strange.. they're in the copy i had saved here :)
> 
> > httrack will follow the links that don't match my
> > patterns to find the other ones. I really want it
> to
> 
> It won't.  The best you can do is spider the html
> and only download images etc on the pages you want:
> -* +*/listing/* +*.cfm +*.htm*
Ok. But i still only need to store the HTML files for those pages i used to
get there.. so i suppose that's not too bad.
> 
> Alternative is to start httrack on a */listing/*
> page, then
> -* +*/listing/* will work.
Supposing a listing page links to the other listing pages... which is my
problem.
I need it to step BACK up the tree and then down again in order to FIND all
the listing pages.

Anyways, thanks for the reply.. I'll post back and let you know what i end up
with .

Create subthread

All articles

Subject	Author	Date
Command Line Spidering		03/25/2008 14:03
Re: Command Line Spidering		03/25/2008 19:34
Re: Command Line Spidering		03/26/2008 13:18