Re: Query Strings and Web Server Rewrite Rules - HTTrack Website Copier Forum

Subject: Re: Query Strings and Web Server Rewrite Rules

Author: Gerald Wise

Date: 10/01/2004 14:08

> - httrack has an option to replace all external links
> with a link to an internal "error page", where the 
> original URL is passed as a CGI parameter. Replace
> the default error page with your own CGI script. This 
> script looks into a table/database, if the URL belongs 
to 
> a page mirrored in another archive; if yes, it sends a
> redirect to the browser, otherwise it returns an error
> message.

I haven't looked at this closely!  Thanks for the tip.  I 
though the original URL was being completely replaced with 
the path to an HTML error page.
 

> The only minor annoyance I see right now with this 
solution
> is this: You must of course identify the different 
archives
> somehow in the database, but httrack currently does not 
> reveal the "project name" in any callback (or did I miss 
> something?), and it does not allow to pass arbitrary 
> parameters to the callback module.

Not a problem for my implementation.  The individual 
projects are stored by site structure, so in the project 
the server and domain are stored (i.e. www.sun.com).  When 
I merge the projects together, a partial of one site would 
still be stored under its original server and domain.  For 
instance, two mirrors that would be stored as follows on 
disk:

   Sunsolve Patches  -
                     |-sunsolve.sun.com/...
   Sunsolve Handbook -
                     |-sunsolve.sun.com/handbook_pub/...

Would both be merged under sunsolve.sun.com:

   /htdocs -
          |- sunsolve.sun.com -
                              |- handbook_pub

So most links work when I prepend the original URL with my 
local web host (i.e. <http://sunsolve.sun.com/handbook_pub> 
becomes 
<http://myserver.mydomain.com/sunsolve.sun.com/handbook_pub>.
  Additionally, relative URLs also continue to work 
between projects, with the exception of the MD5 hash 
method of storign query strings, which is currently my 
only setback.

Great suggestions.  I'll pursue the python module and see 
if I can't work out something.  Since I'm not familiar 
with Python, it may take some time.  If anyone has some 
code snipets that might work (with or without 
modification) and you would be willing to share, please 
feel free to post them.  It would be greatly appreciated.  
The largest problem I am going to have is identifying how 
to recreate the MD5 hash of the query strings or use the 
python module to maintain the original URL with minimal 
changes (without the MD5 hash).

Thanks,
Gerald

Create subthread

All articles

Subject	Author	Date
Query Strings and Web Server Rewrite Rules		09/17/2004 04:43
Re: Query Strings and Web Server Rewrite Rules		09/17/2004 19:07
Re: Query Strings and Web Server Rewrite Rules		09/18/2004 19:06
Re: Query Strings and Web Server Rewrite Rules		09/19/2004 10:08
Re: Query Strings and Web Server Rewrite Rules		09/19/2004 16:47
Re: Query Strings and Web Server Rewrite Rules		10/01/2004 13:54
Re: Query Strings and Web Server Rewrite Rules		10/01/2004 14:08