HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Remove http arguments to avoid duplicate files
Author: Xavier Roche
Date: 12/02/2003 20:29
 
> i'm just having a problem dowloading a huge dynamic web 
site
> it's in jsp and i try to rename all the files to html
> extensions with a 'user define structure' like
> %h%p/%n_%[node]_%[lang].html where node and lang are HTPP
> params to the jsp.
> My problem is that sometimes some other params appears in
> the URLS like 'refresh=...', to display the same content,
> and it seems that HTTrack duplicate the file with a -1 -
2 ..
> extension

That's perfectly normal: httrack finds a name collision, 
and is obliged to rename the local file(s)

> Someone has an idea either to remove these HTTP params 
from
> the downloaded website or to refresh the whole downloaded
> website ?
I'm not sure I understand what you want to do exactly:

1. do you want to avoid downloading these specific pages?Then, use scan rules
to forbide them ; such as:
-*/*?*refresh=*

2. do you want to include the 'refresh' parameter in the 
destination filename, only if this parameter is found?Then, use advanced
variable extraction:
%h%p/%n_%[node]_%[lang]%[refresh:_:::].html 

3. do you want to consider the links with 'refresh=..' 
identical to those without this parameters (that is, this 
parameter is useless) ?Then, this is not yet possible without some coding
(maybe 
by hacking the "check-link" callback using some C coding)

 
Reply Create subthread


All articles

Subject Author Date
Remove http arguments to avoid duplicate files

12/01/2003 10:12
Re: Remove http arguments to avoid duplicate files

12/01/2003 15:31
Re: Remove http arguments to avoid duplicate files

12/01/2003 17:40
Re: Remove http arguments to avoid duplicate files

12/02/2003 20:29
Re: Remove http arguments to avoid duplicate files

12/15/2006 17:15
Re: Remove http arguments to avoid duplicate files

05/04/2009 21:53




2

Created with FORUM 2.0.11