| Is anyone familiar with an Apache rewrite rule that might
allow creating the MD5 hash to pages with query strings?
(Read on, as using rewrite rules is only an idea and may
not be the solution).
After a lot of digging around the FAQ and formum, I find
that it is not possible to include a form of the query
string as the filename on a global scale. For example,
converting:
<http://sunsolve.sun.com/show.pl?target=home>
to:
<http://sunsolve.sun.com/show.pl@target=home>
I use WinHTTrack to mirror several technical sites and
then host these using Apache on a network that is
disconnected from the Internet. This has been invaluable
to our developers, that usually have to fight over one
Internet terminal for 20 - 25 individuals!
Ok, here's the problem. Most of the technical sites are
quite large and I use extensive filters to tailor the
mirroring. Additionally, I split the mirrors for sites
into multiple projects to reduce the need to update
information that changes infrequently. However, the sites
reference sections between each other that use query
strings. Sun Microsystems is a good example that I have
found and that can be tested against:
<http://sunsolve.sun.com/show.pl?target=home>
<http://sunsolve.sun.com/handbook_pub>
sunsolve.sun.com is quite large, especially if you include
the patch database.
An example project set would be to mirror the handbook_pub
section once a quarter, as it seldom changes. A second
project can be setup to mirror the main sunsolve.sun.com
site's main pages weekly, excluding patch information
pages (or limiting the depth). However, when trying to
bring these two projects together on an Apache server,
links between the sections do not work (links outside each
project are absolute).
Can anyone offer suggestions for either of the following
options:
1) HTTrack settings that will allow links to work between
projects that are merged under the same root directory
structure.
2) A rewrite rule for Apache that would allow rewriting
the links for query strings to external links.
Currently, a script is used to parse and change all
absolute links to include the Apache server's address.
For instance, <http://sunsolve.sun.com> becomes
<http://intranet.ourdomain.com/sunsolve.sun.com>
Also, all mirrored data is relocated to the same root
directory on the server. For instance:
/htdocs/sunsolve.sun.com
/htdocs/www.httrack.com
This has worked well for maintaining links between sites
in multiple HTTrack projects. However, the process fails
for links with query strings. Any help or suggestions
would be greatly appreciated!
Thanks a million,
Gerald
| |