| Ooops, I should have guessed: index.txt is only
created *after* the mirroring is *finished* --it
does not happen in the paused state...
And, I have 3 questions now :-)
1) Having looked at index.txt, and I see that it
is not Unicode. Infact all the characters are
ISO-8859-1.
ISO-8859-1 might be useful for a search engine
(I am not so sure though), but it definitely
can not be used for linguistic analysis.
Is this a bug, or is it a known design feature?
2) Is there a command-line setting to apply
'Word database' to other previously mirrored
sites?
Some of which was mirrored with HTTrack and
some with others.(before HTTrack)?
3) This seems to be definitely a bug: 'Word database'
option only works with html files that contain only
chars in the ASCI charset. It does not seem to work
with ISO-8859-X (where X is greater than 1). When X
is greater than one HTTrack splits that word whenever
it sees non-ASCII char.
Cheers,
Adem | |