many versions of same main page - HTTrack Website Copier Forum

Subject: many versions of same main page

Author: Nicolas

Date: 12/23/2006 03:30

Hi,

I'm currently using HTTrack to create an offline version of a large dynamic
web page I've built. The thing is that all the action is basically taking
place on one page, frontend.aspx. There is an important query parameter
path=... which has the frontend basically go down a directory-like structure.

I now have two questions:

First, some way into the path, the structure seems to become inconsistent; I
click somewhere and I end up somewhere else. How can this be? Is there a
chance HTTrack gets confused because I'm so often using the same page?
Second, I have the impression I am getting a lot more retrieved pages than I
should, even considering certain permutations in the path and stuff. Is it
somehow possible to get a list of all crawled URLs so I could check if
parameter settings are only crawled once each?
I'm asking the two questions together because I wonder if they might be
related. For example, I was wondering how the 4-hex-digit-ID attached to each
crawled file is created. What happens when I exceed 65536 versions of the same
page? And is there a chance IDs might be mixed even before this number is
reached?
I'd appreciate any tip - thanks for your support in advance, and for your
great product in general!

best wishes,
Nic

All articles

Subject	Author	Date
many versions of same main page		12/23/2006 03:30
Re: many versions of same main page		12/23/2006 03:31