| Hello,
I would like to know how I could filter a website (mine)
www.foo.com/bar/1/content.html
www.foo.com/bar/2/content.html
I would just like to retrieve www.foo.com/bar/1/* and www.foo.com/bar/2/*
That is easily done by doing "httrack www.foo.com/bar/1/"
The problem is that some files inside point to other part of the website and
it is taken into account as legit by httrack.
E.g:
www.foo.com/bar/2/content3.html has an url that redirects to
www.foo.com/bar/web/content.html and httrack understands it as being legit
since it has been found under the /2/ directory
Even "-r2" (less will block the entire process way too early) didn't help.
So I would like a way to tell httrack :
"Please Httrack, only just download pages which are www.foo.com/bar/2/* and
never follow any other redirections, keep going until there is no more files
in www.foo.com/bar/2/* and, again, do not redirect to other part of the
website, you will be kind"
I know, kind of personified here, but that is the idea :)
Any idea ?
Thanks, | |