From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Alexander Kotelnikov <sacha@myxomop.com>
Newsgroups: gmane.emacs.devel
Subject: Re: converting between charsets
Date: Sat, 13 May 2006 22:42:04 +0400
Organization: Global disintoxication
Message-ID: <84bqu17on7.fsf@vinci.loc>
References: <87lktejh6f.fsf@myxomop.com> <87u082109z.fsf-monnier+emacs@gnu.org>
	<84veshaajc.fsf@vinci.loc> <87d5ep1a2c.fsf-monnier+emacs@gnu.org>
	<84hd40am8t.fsf@vinci.loc> <jwvfyjkzjv9.fsf-monnier+emacs@gnu.org>
	<84ac9rah6z.fsf@vinci.loc> <87k68v82q6.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: main.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
X-Trace: sea.gmane.org 1147545956 6755 80.91.229.2 (13 May 2006 18:45:56 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Sat, 13 May 2006 18:45:56 +0000 (UTC)
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat May 13 20:45:47 2006
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by ciao.gmane.org with esmtp (Exim 4.43)
	id 1Fez7R-0008IH-Qh
	for ged-emacs-devel@m.gmane.org; Sat, 13 May 2006 20:45:38 +0200
Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Fez7R-0002Xe-AE
	for ged-emacs-devel@m.gmane.org; Sat, 13 May 2006 14:45:37 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Fez7D-0002XE-0w
	for emacs-devel@gnu.org; Sat, 13 May 2006 14:45:23 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Fez7A-0002Wq-HO
	for emacs-devel@gnu.org; Sat, 13 May 2006 14:45:22 -0400
Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Fez7A-0002Wn-Dw
	for emacs-devel@gnu.org; Sat, 13 May 2006 14:45:20 -0400
Original-Received: from [80.91.229.2] (helo=ciao.gmane.org)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32)
	(Exim 4.52) id 1Fez9B-00005o-C1
	for emacs-devel@gnu.org; Sat, 13 May 2006 14:47:25 -0400
Original-Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1Fez70-0008EW-P0
	for emacs-devel@gnu.org; Sat, 13 May 2006 20:45:10 +0200
Original-Received: from 81.211.124.120.adsl-spb.net.rol.ru ([81.211.124.120])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <emacs-devel@gnu.org>; Sat, 13 May 2006 20:45:10 +0200
Original-Received: from sacha by 81.211.124.120.adsl-spb.net.rol.ru with local (Gmexim
	0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <emacs-devel@gnu.org>; Sat, 13 May 2006 20:45:10 +0200
X-Injected-Via-Gmane: http://gmane.org/
Mail-Followup-To: emacs-devel@gnu.org
Original-To: emacs-devel@gnu.org
Original-Lines: 113
Original-X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 81.211.124.120.adsl-spb.net.rol.ru
Mail-Copies-To: never
User-Agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)
Cancel-Lock: sha1:iHPcWRY1ZzJTaJmsXBMOH04xRBo=
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:54394
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/54394>

>>>>> On Tue, 09 May 2006 14:42:01 -0400
>>>>> "SM" == Stefan Monnier <monnier@iro.umontreal.ca> wrote:
SM> 
>> I started this thread from note about problems with
>> encoding-coding-region:
SM> 
>>>>> On Sun, 07 May 2006 13:52:08 +0400
>>>>> "AK" == Alexander Kotelnikov <sacha@myxomop.com> wrote:
AK> 
AK> There could be three different ways, which I checked, how characters
AK> to be converted can appear in emacs buffer:
AK> a. when I open such file.
AK> b. when I type in characters and my keyboard layout in X is different
AK> from 'us', for me it is normally 'ru' then.
AK> c. when I type in after I used toggle-input-method.
AK> 
AK> 
AK> And the trouble is that encode-coding-region converts only in case
AK> (c). In (a) and (b) characters that need conversion are substituted
AK> with question marks. And even in (c) conversion is performed (if, for
AK> instance, I save a file after it appears to be in koi8-r) in the
AK> converted buffer converted characters are shown in \321 manner.
AK> 
AK> So, it will be nice to get some help on this, thanks.
SM> 
SM> Please explain why you think there is relation between those things and
SM> encode-coding-region.  And of course, that will involve describing where how
SM> and when you call encode-coding-region.

I do not understand the question. I use encode-coding-region to encode
a region into a charset and some characters are not encoded, but are
substituted with question mark.

SM> Oh, I see.  I don't know enough of how this works to help you much further.
SM> If you hit C-u C-x = on the various chars (especially on two similar chars
SM> displayed with different fonts), you'll see that they come from different
SM> charsets (one is probably something like iso-8859-5 and the other may be
SM> unicode).  Emacs-22 doesn't unify them by default.  You can try to put
SM> (unify-8859-on-decoding-mode 1) in your .emacs.  And you can also try to
SM> play with utf-fragment-on-decoding.  And ask someone more knowledgeable
SM> about such problems.
SM> 
>> On first character like latin T:
>> character: <I removed cyrillic character> (01212102, 332866, 0x51442)-A
>> charset: mule-unicode-0100-24ff
>> (Unicode characters of the range U+0100..U+24FF.)
>> code point: 40 66
>> syntax: word
>> category: y:Cyrillic  
>> buffer code: 0x9C 0xF4 0xA8 0xC2
>> file code: 0xD0 0xA2 (encoded by coding system mule-utf-8)
>> font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso10646-1
SM> 
>> After the same character in the next line:
>> character: <I remove cyrillic character shown with wrong fontt> (0151664, 54196, 0xd3b4)
>> charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87)
>> code point: 39 52
>> syntax: word
>> category: Y:Cyrillic characters of 2-byte character sets   j:Japanese  
>> |:While filling, we can break a line at this character.  
>> buffer code: 0x92 0xA7 0xB4
>> file code: not encodable by coding system mule-utf-8
>> font: -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-140-JISX0208.1983-0
SM> 
>> Something is not ok here...
SM> 
SM> Same kind of issue as the one I mentioned.
SM> Have you tried unify-8859-on-decoding-mode?

Just tried. Nothing changes.

SM> In any case, please report this via M-x report-emacs-bug with as many
SM> painful details as you can come up with (i.e. describe how to reproduce the
SM> problem starting from "emacs -Q", showing your locale, etc...).
SM> 
SM> You could even M-x report-emacs-bug about it, since maybe the default config
SM> in a cyrillic locale should already take care of it.
SM> 
>>>>> Cyrillic nput in emacs -nw in xterm still does not work, if I just
>>>>> change X keyboard layout.
SM> 
SM> That doesn't give us much to go on, does it?  What does it do, other than
SM> "not work"?
SM> 
>>>> It beeps.
SM> 
SM> What does C-h l show after hitting a particular key?
SM> 
>> M-P M-0 C-h l
SM> 
SM> So when you hit that key, Emacs received M-P M-0 rather than the char you
SM> think you sent to it.  What is your locale?

22:37 pts/28 sacha@vinci:~ 1> locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES=C
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=


-- 
Alexander Kotelnikov
Saint-Petersburg, Russia