HTTrack Website Copier
Free software offline browser - FORUM
Subject: url with comma incorrectly handled
Author: jcv
Date: 12/15/2012 10:38
 
Hello,

I just noticed webpages with comma are incorrectly handled by httrack.

The spider correctly follows the links, correctly rename pages, but do not
rename those pages in html files.

For example, I have a page named "-Benin,27-.html" which is renamed to
"_BENIN_27_.HTM" by httrack. However, once completely retrieved the website,
the link that appears in pages is "-Benin,27-.html". As a result, a 404 not
found error is displayed instead.

I am using HTTrack version 3.46+libhtsjava.so.2, with the following options:

httrack -q -%i -iC2 <http://www.upr-info.org> -O "/Datas/cle
usb/.UPR_Info_mirror-mirrordata/uprinfo.org" -n -%P -N0 -L2 -s2 -p7 -D -a -K0
-c8 -%s -%u -%I  -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F
"<!-- Mirrored for UPR Info offline -->" +*.png +*.gif +*.jpg +*.css +*.js
+*.upr-info.org/* -*.upr-info.org/database/index.php*
-*.upr-info.org/spip.php?page=backend*
-*.upr-info.org/spip.php?page=rubrique-print* -ad.doubleclick.net/*
-sflogo.sourceforge.net/* -tincan.co.uk/* -www.gnu.org/* -www.google.com/*
-www.google-analytics.com/* -zulu.tweetmeme.com/* -ajax.googleapis.com/*
-platform.twitter.com/* -*.addthis.com/* -*.phplist.com

Any idea of what is happening?(sorry for reposting, but I did not understand
WHRoeder's response)
 
Reply


All articles

Subject Author Date
url with comma incorrectly handled

12/15/2012 10:38




f

Created with FORUM 2.0.11