HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Remove http arguments to avoid duplicate files
Author: enriquevagu
Date: 12/15/2006 17:15
 
> > i'm just having a problem dowloading a huge dynamic web site
> > it's in jsp and i try to rename all the files to html
> > extensions with a 'user define structure' like
> > %h%p/%n_%[node]_%[lang].html where node and lang are HTPP
> > params to the jsp.
> > My problem is that sometimes some other params appears in
> > the URLS like 'refresh=...', to display the same  content,
> > and it seems that HTTrack duplicate the file with a -1 -2 .. extension
> 
> That's perfectly normal: httrack finds a name collision, 
> and is obliged to rename the local file(s)
> 
> > Someone has an idea either to remove these HTTP params 
> from the downloaded website or to refresh the whole
> downloaded website ?> 
> I'm not sure I understand what you want to do exactly:
> 
> 3. do you want to consider the links with 'refresh=..' 
> identical to those without this parameters (that is,
> this parameter is useless) ?> Then, this is not yet possible without some
coding (maybe 
> by hacking the "check-link" callback using some C coding)
>

Hi all,

    I am sorry to reply an old post, but I have the same problem that was
considered in this post, and I don't know if in this time (the original post
comes from 2003, three years ago), it has been somehow automated. 

    I would like to backup a dynamic web that has some significative params,
but there is also a "jsession" parameter that varies from request to request,
but it is uselles. It is exactly the third case that was commented by  Xavier
in the old post. An example page from my interest web is: (it is a touristic
web)
<http://www.turismocastillayleon.com/cm/turcyl/tkContent;jsessionid=7F8E7761C59E3832D7DE80DE09761AFC?idContent=111&locale=es_ES&textOnly=false>

    Does anyone know if there is a simple (i.e., without recompiling) way to
do this? I haven't found any "omit param" option, or similar, but I consider
that it might be very interesting for these "difficult" cases.

     Thanks in advance,

Enriquevagu
 
Reply Create subthread


All articles

Subject Author Date
Remove http arguments to avoid duplicate files

12/01/2003 10:12
Re: Remove http arguments to avoid duplicate files

12/01/2003 15:31
Re: Remove http arguments to avoid duplicate files

12/01/2003 17:40
Re: Remove http arguments to avoid duplicate files

12/02/2003 20:29
Re: Remove http arguments to avoid duplicate files

12/15/2006 17:15
Re: Remove http arguments to avoid duplicate files

05/04/2009 21:53




5

Created with FORUM 2.0.11