HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Word database (RC13)
Author: Xavier Roche
Date: 08/28/2003 20:29
 
> 1) Having looked at index.txt, and I see that it 
> is not Unicode. Infact all the characters are 
> ISO-8859-1. 
> Is this a bug, or is it a known design feature?
This is a limit. The word database is really basic ; and 
the htsindex.c contains these definitions:

#define 
KEYW_ACCEPT          "abcdefghijklmnopqrstuvwxyz0123456789-
_."
// Convert A to a, and so on.. to avoid case problems in 
indexing
// This can be a generic table, containing characters that 
are in fact not accepted by KEYW_ACCEPT
// MUST HAVE SAME SIZES!!
#define KEYW_TRANSCODE_FROM  (\
                               "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 
\
                               "àâä" \
                               "ÀÂÄ" \
                               "éèêë" \
                               "ÈÈÊË" \
                               "ìîï" \
                               "ÌÎÏ" \
                               "òôö" \
                               "ÒÔÖ" \
                               "ùûü" \
                               "ÙÛÜ" \
                               "ÿ" \
                             )
#define KEYW_TRANSCODE_TO    ( \
                               "abcdefghijklmnopqrstuvwxyz" 
\
                               "aaa" \
                               "aaa" \
                               "eeee" \
                               "eeee" \
                               "iii" \
                               "iii" \
                               "ooo" \
                               "ooo" \
                               "uuu" \
                               "uuu" \
                               "y" \

KEYW_ACCEPT should be set to all valid characters for a 
keyword (that is, adding characters 128-255) and 
KEYW_TRANSCODE_FROM/KEYW_TRANSCODE_TO be set to ""

> 2) Is there a command-line setting to apply 
> 'Word database' to other previously mirrored 
> sites? 

No. But you can activate the option and "continue an 
interrupted mirror", operation which should be fast.

> Some of which was mirrored with HTTrack and 
> some with others.(before HTTrack)?
For those mirrored without httrack: no.

 
Reply Create subthread


All articles

Subject Author Date
Word database (RC13)

08/28/2003 08:15
Re: Word database (RC13)

08/28/2003 08:38
Re: Word database (RC13)

08/28/2003 20:29
Re: Word database (RC13)

08/29/2003 02:48
Re: Word database (RC13)

08/30/2003 08:36




b

Created with FORUM 2.0.11