| > > When I download Turkish website I am getting question
> > marks '?' instead of Turkish charecters such as ðüçþ
Problem detected: Many (most) pages are UCS-2 unicode (that
is, 16-bit
raw data), which is strongly unadvised on the internet:
you'd better use
UTF-8, which is more compatible, and more portable (many
new characters
can not be represented anymore as UCS-2 characters, and
besides utf-8 is
the de-facto standard now for xml and html)
Currently httrack does not properly recognize all UCS-2
pages ; I will
improve the UCS2 detection in the next release, but this
won't fix the
problem: non-ascii characters will be lost anyway.
You can generally easily save as "UTF-8" using more tools ;
in this case
the "charset" will have to be defined like:
<meta http-equiv="Content-Type" content="text/html;
charset=utf-8">
| |