| Hi,
I'm currently using HTTrack to create an offline version of a large dynamic
web page I've built. The thing is that all the action is basically taking
place on one page, frontend.aspx. There is an important query parameter
path=... which has the frontend basically go down a directory-like structure.
I now have two questions:
First, some way into the path, the structure seems to become inconsistent; I
click somewhere and I end up somewhere else. How can this be? Is there a
chance HTTrack gets confused because I'm so often using the same page?
Second, I have the impression I am getting a lot more retrieved pages than I
should, even considering certain permutations in the path and stuff. Is it
somehow possible to get a list of all crawled URLs so I could check if
parameter settings are only crawled once each?
I'm asking the two questions together because I wonder if they might be
related. For example, I was wondering how the 4-hex-digit-ID attached to each
crawled file is created. What happens when I exceed 65536 versions of the same
page? And is there a chance IDs might be mixed even before this number is
reached?
I'd appreciate any tip - thanks for your support in advance, and for your
great product in general!
best wishes,
Nic | |