HTTrack Website Copier
Free software offline browser - FORUM
Subject: Regex-like ?
Author: Larry
Date: 06/27/2013 10:22
 
Hello,

I would like to know how I could filter a website (mine)

www.foo.com/bar/1/content.html
www.foo.com/bar/2/content.html

I would just like to retrieve www.foo.com/bar/1/* and www.foo.com/bar/2/*

That is easily done by doing "httrack www.foo.com/bar/1/"

The problem is that some files inside point to other part of the website and
it is taken into account as legit by httrack.
E.g:

www.foo.com/bar/2/content3.html has an url that redirects to
www.foo.com/bar/web/content.html and httrack understands it as being legit
since it has been found under the /2/ directory

Even "-r2" (less will block the entire process way too early) didn't help.

So I would like a way to tell httrack :

"Please Httrack, only just download pages which are www.foo.com/bar/2/* and
never follow any other redirections, keep going until there is no more files
in www.foo.com/bar/2/* and, again, do not redirect to other part of the
website, you will be kind"

I know, kind of personified here, but that is the idea :)

Any idea ?
Thanks,
 
Reply


All articles

Subject Author Date
Regex-like ?

06/27/2013 10:22
Re: Regex-like ?

06/27/2013 11:54
Re: Regex-like ?

06/27/2013 15:18
Re: Regex-like ?

06/27/2013 15:39




9

Created with FORUM 2.0.11