Subject: mirroring vs
Author: Haudy Kazemi
Date: 05/12/2002 22:20
I'm looking for a solution to this problem:
Some sites on their internal links refer to themselves 
as "" in some locations, 
and "" in other locations.  The way their 
webserver is configured, you can access their all 
their pages with either URL.

The problem is in mirroring these sites: is considered different from, with the result of the website being 
copied twice, even if all the content is the same.

So, what I'm asking, is there a way to tell HTTrack 
that " =" so it can 
consolidate the copy into one directory, downloading 
the links only once.  Most of the time I've seen this 
problem I've simply told HTTrack to grab both URLs, 
but now I'm trying to get a large site (I've capped 
HTTrack at 10KB/sec with 1 connection) and telling it 
to use both URLs isn't going to be very feasible.

Is there a way to tell HTTrack's URL rewriting engine 
about things like this?  Is there a way to do this 
with an external program?

