| > - httrack has an option to replace all external links
> with a link to an internal "error page", where the
> original URL is passed as a CGI parameter. Replace
> the default error page with your own CGI script. This
> script looks into a table/database, if the URL belongs
to
> a page mirrored in another archive; if yes, it sends a
> redirect to the browser, otherwise it returns an error
> message.
I haven't looked at this closely! Thanks for the tip. I
though the original URL was being completely replaced with
the path to an HTML error page.
> The only minor annoyance I see right now with this
solution
> is this: You must of course identify the different
archives
> somehow in the database, but httrack currently does not
> reveal the "project name" in any callback (or did I miss
> something?), and it does not allow to pass arbitrary
> parameters to the callback module.
Not a problem for my implementation. The individual
projects are stored by site structure, so in the project
the server and domain are stored (i.e. www.sun.com). When
I merge the projects together, a partial of one site would
still be stored under its original server and domain. For
instance, two mirrors that would be stored as follows on
disk:
Sunsolve Patches -
|-sunsolve.sun.com/...
Sunsolve Handbook -
|-sunsolve.sun.com/handbook_pub/...
Would both be merged under sunsolve.sun.com:
/htdocs -
|- sunsolve.sun.com -
|- handbook_pub
So most links work when I prepend the original URL with my
local web host (i.e. <http://sunsolve.sun.com/handbook_pub>
becomes
<http://myserver.mydomain.com/sunsolve.sun.com/handbook_pub>.
Additionally, relative URLs also continue to work
between projects, with the exception of the MD5 hash
method of storign query strings, which is currently my
only setback.
Great suggestions. I'll pursue the python module and see
if I can't work out something. Since I'm not familiar
with Python, it may take some time. If anyone has some
code snipets that might work (with or without
modification) and you would be willing to share, please
feel free to post them. It would be greatly appreciated.
The largest problem I am going to have is identifying how
to recreate the MD5 hash of the query strings or use the
python module to maintain the original URL with minimal
changes (without the MD5 hash).
Thanks,
Gerald
| |