| I am mirroring a subset of pages of a website. The pages
have links that are of the form:
<http://www.website.com/path/12345/page.html>
<http://www.website.com/path/34567/page.html>
<http://www.website.com/path/76543/page.html>
and so on. Part of the path changes but not the name of the
main page in that directory. I dont have any problem
grabbing the contents. But what happens is when I grab the
40th page, it replaces the last one on the index and purges
it. i.e. if the 39th link on the index page points to http:
//www.website.com/path/99999/page.html and I grab http:
//www.website.com/path/77777/page.html, 77777 replaces 99999
as the last item on the index page and 99999 is purged! Here
is my command line:
httrack <http://www.website.com/$number/page.html> --near
--continue --build-top-index --cookies=1 --robots=0
--max-rate=16000 --sockets=2 --cache=1
have I misunderstood the usage of --build-top-index ? | |