Re: Turkish charecters - HTTrack Website Copier Forum

Subject: Re: Turkish charecters

Author: Xavier Roche

Date: 04/08/2003 23:15

> > When I download Turkish website I am getting question 
> > marks '?' instead of Turkish charecters such as ðüçþ

Problem detected: Many (most) pages are UCS-2 unicode (that 
is, 16-bit
raw data), which is strongly unadvised on the internet: 
you'd better use
UTF-8, which is more compatible, and more portable (many 
new characters
can not be represented anymore as UCS-2 characters, and 
besides utf-8 is
the de-facto standard now for xml and html)

Currently httrack does not properly recognize all UCS-2 
pages ; I will
improve the UCS2 detection in the next release, but this 
won't fix the
problem: non-ascii characters will be lost anyway.

You can generally easily save as "UTF-8" using more tools ; 
in this case
the "charset" will have to be defined like:

<meta http-equiv="Content-Type" content="text/html; 
charset=utf-8">

Create subthread

All articles

Subject	Author	Date
Turkish charecters		04/07/2003 22:36
Re: Turkish charecters		04/07/2003 22:59
Re: Turkish charecters		04/08/2003 23:15