HTTrack Website Copier
Free software offline browser - FORUM
Subject: Query Strings and Web Server Rewrite Rules
Author: Gerald Wise
Date: 09/17/2004 04:43
 
Is anyone familiar with an Apache rewrite rule that might 
allow creating the MD5 hash to pages with query strings?  
(Read on, as using rewrite rules is only an idea and may 
not be the solution).

After a lot of digging around the FAQ and formum, I find 
that it is not possible to include a form of the query 
string as the filename on a global scale.  For example, 
converting:

   <http://sunsolve.sun.com/show.pl?target=home>

to:

   <http://sunsolve.sun.com/show.pl@target=home>

I use WinHTTrack to mirror several technical sites and 
then host these using Apache on a network that is 
disconnected from the Internet.  This has been invaluable 
to our developers, that usually have to fight over one 
Internet terminal for 20 - 25 individuals!

Ok, here's the problem.  Most of the technical sites are 
quite large and I use extensive filters to tailor the 
mirroring.  Additionally, I split the mirrors for sites 
into multiple projects to reduce the need to update 
information that changes infrequently.  However, the sites 
reference sections between each other that use query 
strings. Sun Microsystems is a good example that I have 
found and that can be tested against:

   <http://sunsolve.sun.com/show.pl?target=home>
   <http://sunsolve.sun.com/handbook_pub>

sunsolve.sun.com is quite large, especially if you include 
the patch database.  

An example project set would be to mirror the handbook_pub 
section once a quarter, as it seldom changes.  A second 
project can be setup to mirror the main sunsolve.sun.com 
site's main pages weekly, excluding patch information 
pages (or limiting the depth).  However, when trying to 
bring these two projects together on an Apache server, 
links between the sections do not work (links outside each 
project are absolute).

Can anyone offer suggestions for either of the following 
options:

1) HTTrack settings that will allow links to work between 
projects that are merged under the same root directory 
structure.

2) A rewrite rule for Apache that would allow rewriting 
the links for query strings to external links.

Currently, a script is used to parse and change all 
absolute links to include the Apache server's address.  
For instance, <http://sunsolve.sun.com> becomes 
<http://intranet.ourdomain.com/sunsolve.sun.com>

Also, all mirrored data is relocated to the same root 
directory on the server.  For instance:

  /htdocs/sunsolve.sun.com
  /htdocs/www.httrack.com

This has worked well for maintaining links between sites 
in multiple HTTrack projects.  However, the process fails 
for links with query strings.  Any help or suggestions 
would be greatly appreciated!

Thanks a million,
Gerald

 
Reply


All articles

Subject Author Date
Query Strings and Web Server Rewrite Rules

09/17/2004 04:43
Re: Query Strings and Web Server Rewrite Rules

09/17/2004 19:07
Re: Query Strings and Web Server Rewrite Rules

09/18/2004 19:06
Re: Query Strings and Web Server Rewrite Rules

09/19/2004 10:08
Re: Query Strings and Web Server Rewrite Rules

09/19/2004 16:47
Re: Query Strings and Web Server Rewrite Rules

10/01/2004 13:54
Re: Query Strings and Web Server Rewrite Rules

10/01/2004 14:08




b

Created with FORUM 2.0.11