From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alexander Kotelnikov Newsgroups: gmane.emacs.devel Subject: Re: converting between charsets Date: Sat, 13 May 2006 22:42:04 +0400 Organization: Global disintoxication Message-ID: <84bqu17on7.fsf@vinci.loc> References: <87lktejh6f.fsf@myxomop.com> <87u082109z.fsf-monnier+emacs@gnu.org> <84veshaajc.fsf@vinci.loc> <87d5ep1a2c.fsf-monnier+emacs@gnu.org> <84hd40am8t.fsf@vinci.loc> <84ac9rah6z.fsf@vinci.loc> <87k68v82q6.fsf-monnier+emacs@gnu.org> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1147545956 6755 80.91.229.2 (13 May 2006 18:45:56 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 13 May 2006 18:45:56 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat May 13 20:45:47 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Fez7R-0008IH-Qh for ged-emacs-devel@m.gmane.org; Sat, 13 May 2006 20:45:38 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Fez7R-0002Xe-AE for ged-emacs-devel@m.gmane.org; Sat, 13 May 2006 14:45:37 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Fez7D-0002XE-0w for emacs-devel@gnu.org; Sat, 13 May 2006 14:45:23 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Fez7A-0002Wq-HO for emacs-devel@gnu.org; Sat, 13 May 2006 14:45:22 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Fez7A-0002Wn-Dw for emacs-devel@gnu.org; Sat, 13 May 2006 14:45:20 -0400 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1Fez9B-00005o-C1 for emacs-devel@gnu.org; Sat, 13 May 2006 14:47:25 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Fez70-0008EW-P0 for emacs-devel@gnu.org; Sat, 13 May 2006 20:45:10 +0200 Original-Received: from 81.211.124.120.adsl-spb.net.rol.ru ([81.211.124.120]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 13 May 2006 20:45:10 +0200 Original-Received: from sacha by 81.211.124.120.adsl-spb.net.rol.ru with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 13 May 2006 20:45:10 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-To: emacs-devel@gnu.org Original-Lines: 113 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 81.211.124.120.adsl-spb.net.rol.ru Mail-Copies-To: never User-Agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) Cancel-Lock: sha1:iHPcWRY1ZzJTaJmsXBMOH04xRBo= X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54394 Archived-At: >>>>> On Tue, 09 May 2006 14:42:01 -0400 >>>>> "SM" == Stefan Monnier wrote: SM> >> I started this thread from note about problems with >> encoding-coding-region: SM> >>>>> On Sun, 07 May 2006 13:52:08 +0400 >>>>> "AK" == Alexander Kotelnikov wrote: AK> AK> There could be three different ways, which I checked, how characters AK> to be converted can appear in emacs buffer: AK> a. when I open such file. AK> b. when I type in characters and my keyboard layout in X is different AK> from 'us', for me it is normally 'ru' then. AK> c. when I type in after I used toggle-input-method. AK> AK> AK> And the trouble is that encode-coding-region converts only in case AK> (c). In (a) and (b) characters that need conversion are substituted AK> with question marks. And even in (c) conversion is performed (if, for AK> instance, I save a file after it appears to be in koi8-r) in the AK> converted buffer converted characters are shown in \321 manner. AK> AK> So, it will be nice to get some help on this, thanks. SM> SM> Please explain why you think there is relation between those things and SM> encode-coding-region. And of course, that will involve describing where how SM> and when you call encode-coding-region. I do not understand the question. I use encode-coding-region to encode a region into a charset and some characters are not encoded, but are substituted with question mark. SM> Oh, I see. I don't know enough of how this works to help you much further. SM> If you hit C-u C-x = on the various chars (especially on two similar chars SM> displayed with different fonts), you'll see that they come from different SM> charsets (one is probably something like iso-8859-5 and the other may be SM> unicode). Emacs-22 doesn't unify them by default. You can try to put SM> (unify-8859-on-decoding-mode 1) in your .emacs. And you can also try to SM> play with utf-fragment-on-decoding. And ask someone more knowledgeable SM> about such problems. SM> >> On first character like latin T: >> character: (01212102, 332866, 0x51442)-A >> charset: mule-unicode-0100-24ff >> (Unicode characters of the range U+0100..U+24FF.) >> code point: 40 66 >> syntax: word >> category: y:Cyrillic >> buffer code: 0x9C 0xF4 0xA8 0xC2 >> file code: 0xD0 0xA2 (encoded by coding system mule-utf-8) >> font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso10646-1 SM> >> After the same character in the next line: >> character: (0151664, 54196, 0xd3b4) >> charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87) >> code point: 39 52 >> syntax: word >> category: Y:Cyrillic characters of 2-byte character sets j:Japanese >> |:While filling, we can break a line at this character. >> buffer code: 0x92 0xA7 0xB4 >> file code: not encodable by coding system mule-utf-8 >> font: -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-140-JISX0208.1983-0 SM> >> Something is not ok here... SM> SM> Same kind of issue as the one I mentioned. SM> Have you tried unify-8859-on-decoding-mode? Just tried. Nothing changes. SM> In any case, please report this via M-x report-emacs-bug with as many SM> painful details as you can come up with (i.e. describe how to reproduce the SM> problem starting from "emacs -Q", showing your locale, etc...). SM> SM> You could even M-x report-emacs-bug about it, since maybe the default config SM> in a cyrillic locale should already take care of it. SM> >>>>> Cyrillic nput in emacs -nw in xterm still does not work, if I just >>>>> change X keyboard layout. SM> SM> That doesn't give us much to go on, does it? What does it do, other than SM> "not work"? SM> >>>> It beeps. SM> SM> What does C-h l show after hitting a particular key? SM> >> M-P M-0 C-h l SM> SM> So when you hit that key, Emacs received M-P M-0 rather than the char you SM> think you sent to it. What is your locale? 22:37 pts/28 sacha@vinci:~ 1> locale LANG=ru_RU.UTF-8 LC_CTYPE="ru_RU.UTF-8" LC_NUMERIC=C LC_TIME=C LC_COLLATE="ru_RU.UTF-8" LC_MONETARY="ru_RU.UTF-8" LC_MESSAGES=C LC_PAPER="ru_RU.UTF-8" LC_NAME="ru_RU.UTF-8" LC_ADDRESS="ru_RU.UTF-8" LC_TELEPHONE="ru_RU.UTF-8" LC_MEASUREMENT="ru_RU.UTF-8" LC_IDENTIFICATION="ru_RU.UTF-8" LC_ALL= -- Alexander Kotelnikov Saint-Petersburg, Russia