| I am pretty new to HTTrack but love it so far. So let me explain a scenario
and what I want to happen. Starting URL is:
site.com/a/index.html
/a/index.html points to HTML files in /b directory such as:
site.com/b/1.html
site.com/b/2.html
site.com/b/3.html
In order to get this to work, I added this:
+site.com/a/* +site.com/b/*
The only problem I have seen, is that it goes through and crawls the HTML
files in the "b" directory as well. I want to simply crawl everything in the
"a" directory, and simply download everything (but not crawl) stuff outside of
that. So currently it first gets all files in "b" directory to html.tmp, then
goes back later and revisits and tries to crawl them. This happens even when
I set my depth to the right level.
In summary, I simply want to
* crawl: site.com/a/*
* download (and not parse): site.com/b/*
How can I do this? Thanks in advance. | |