HTTrack Website Copier
Free software offline browser - FORUM
Subject: many versions of same main page
Author: Nicolas
Date: 12/23/2006 03:30
 
Hi,

I'm currently using HTTrack to create an offline version of a large dynamic
web page I've built. The thing is that all the action is basically taking
place on one page, frontend.aspx. There is an important query parameter
path=... which has the frontend basically go down a directory-like structure.

I now have two questions:

First, some way into the path, the structure seems to become inconsistent; I
click somewhere and I end up somewhere else. How can this be? Is there a
chance HTTrack gets confused because I'm so often using the same page?
Second, I have the impression I am getting a lot more retrieved pages than I
should, even considering certain permutations in the path and stuff. Is it
somehow possible to get a list of all crawled URLs so I could check if
parameter settings are only crawled once each?
I'm asking the two questions together because I wonder if they might be
related. For example, I was wondering how the 4-hex-digit-ID attached to each
crawled file is created. What happens when I exceed 65536 versions of the same
page? And is there a chance IDs might be mixed even before this number is
reached?
I'd appreciate any tip - thanks for your support in advance, and for your
great product in general!

best wishes,
Nic
 
Reply


All articles

Subject Author Date
many versions of same main page

12/23/2006 03:30
Re: many versions of same main page

12/23/2006 03:31




8

Created with FORUM 2.0.11