HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Command Line Spidering
Author: rex
Date: 03/26/2008 13:18
 
> >  httrack
> <http://www.websitetospider.com/page.cfm>
> > -O "./spidereddata" "-*" "+.cfm" "+.htm" "+-html"
> > "+*websitetospider.com/listings.cfm/listing/*" 
> -r6
> 
> missing asterisks on the cfm/htm..
True. Strange.. they're in the copy i had saved here :)
> 
> > httrack will follow the links that don't match my
> > patterns to find the other ones. I really want it
> to
> 
> It won't.  The best you can do is spider the html
> and only download images etc on the pages you want:
> -* +*/listing/* +*.cfm +*.htm*
Ok. But i still only need to store the HTML files for those pages i used to
get there.. so i suppose that's not too bad.
> 
> Alternative is to start httrack on a */listing/*
> page, then
> -* +*/listing/* will work.
Supposing a listing page links to the other listing pages... which is my
problem.
I need it to step BACK up the tree and then down again in order to FIND all
the listing pages.

Anyways, thanks for the reply.. I'll post back and let you know what i end up
with .


 
Reply Create subthread


All articles

Subject Author Date
Command Line Spidering

03/25/2008 14:03
Re: Command Line Spidering

03/25/2008 19:34
Re: Command Line Spidering

03/26/2008 13:18




c

Created with FORUM 2.0.11