| > i'm just having a problem dowloading a huge dynamic web
site
> it's in jsp and i try to rename all the files to html
> extensions with a 'user define structure' like
> %h%p/%n_%[node]_%[lang].html where node and lang are HTPP
> params to the jsp.
> My problem is that sometimes some other params appears in
> the URLS like 'refresh=...', to display the same content,
> and it seems that HTTrack duplicate the file with a -1 -
2 ..
> extension
That's perfectly normal: httrack finds a name collision,
and is obliged to rename the local file(s)
> Someone has an idea either to remove these HTTP params
from
> the downloaded website or to refresh the whole downloaded
> website ?
I'm not sure I understand what you want to do exactly:
1. do you want to avoid downloading these specific pages?Then, use scan rules
to forbide them ; such as:
-*/*?*refresh=*
2. do you want to include the 'refresh' parameter in the
destination filename, only if this parameter is found?Then, use advanced
variable extraction:
%h%p/%n_%[node]_%[lang]%[refresh:_:::].html
3. do you want to consider the links with 'refresh=..'
identical to those without this parameters (that is, this
parameter is useless) ?Then, this is not yet possible without some coding
(maybe
by hacking the "check-link" callback using some C coding)
| |