From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Alexander Kotelnikov Newsgroups: gmane.emacs.devel Subject: Re: converting between charsets Date: Tue, 09 May 2006 09:41:08 +0400 Organization: Global disintoxication Message-ID: <84ac9rah6z.fsf@vinci.loc> References: <87lktejh6f.fsf@myxomop.com> <87u082109z.fsf-monnier+emacs@gnu.org> <84veshaajc.fsf@vinci.loc> <87d5ep1a2c.fsf-monnier+emacs@gnu.org> <84hd40am8t.fsf@vinci.loc> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1147154044 26674 80.91.229.2 (9 May 2006 05:54:04 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 9 May 2006 05:54:04 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue May 09 07:54:03 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FdLA3-0003pq-Dc for ged-emacs-devel@m.gmane.org; Tue, 09 May 2006 07:53:47 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FdL9x-0000pc-EP for ged-emacs-devel@m.gmane.org; Tue, 09 May 2006 01:53:25 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1FdL8m-0007tl-0z for emacs-devel@gnu.org; Tue, 09 May 2006 01:52:12 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1FdL2H-0001u6-RU for emacs-devel@gnu.org; Tue, 09 May 2006 01:45:34 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1FdL2H-0001sp-3A for emacs-devel@gnu.org; Tue, 09 May 2006 01:45:29 -0400 Original-Received: from [80.91.229.2] (helo=ciao.gmane.org) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA:32) (Exim 4.52) id 1FdL3G-00052X-An for emacs-devel@gnu.org; Tue, 09 May 2006 01:46:30 -0400 Original-Received: from list by ciao.gmane.org with local (Exim 4.43) id 1FdL21-0002kl-86 for emacs-devel@gnu.org; Tue, 09 May 2006 07:45:13 +0200 Original-Received: from 81.211.124.120.adsl-spb.net.rol.ru ([81.211.124.120]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 09 May 2006 07:45:13 +0200 Original-Received: from sacha by 81.211.124.120.adsl-spb.net.rol.ru with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 09 May 2006 07:45:13 +0200 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-To: emacs-devel@gnu.org Original-Lines: 86 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 81.211.124.120.adsl-spb.net.rol.ru Mail-Copies-To: never User-Agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux) Cancel-Lock: sha1:3hkQ8JqZJ65cMzzH2E5MD+0NajQ= X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54131 Archived-At: >>>>> On Mon, 08 May 2006 10:30:48 -0400 >>>>> "SM" == Stefan Monnier wrote: SM> >> Let's first talk about encoding regions. Why does not it work with >> encode-coding-region? SM> SM> It works. Any evidence that it doesn't? I started this thread from note about problems with encoding-coding-region: >>>>> On Sun, 07 May 2006 13:52:08 +0400 >>>>> "AK" == Alexander Kotelnikov wrote: AK> AK> There could be three different ways, which I checked, how characters AK> to be converted can appear in emacs buffer: AK> a. when I open such file. AK> b. when I type in characters and my keyboard layout in X is different AK> from 'us', for me it is normally 'ru' then. AK> c. when I type in after I used toggle-input-method. AK> AK> AK> And the trouble is that encode-coding-region converts only in case AK> (c). In (a) and (b) characters that need conversion are substituted AK> with question marks. And even in (c) conversion is performed (if, for AK> instance, I save a file after it appears to be in koi8-r) in the AK> converted buffer converted characters are shown in \321 manner. AK> AK> So, it will be nice to get some help on this, thanks. >>>> 1. Paste into Emacs frame works strange: SM> What text did you paste? Where does it come from? >> I type some Russian text in xterm and paste in into Emacs, have a look >> at the attached screenshot. SM> SM> Oh, I see. I don't know enough of how this works to help you much further. SM> If you hit C-u C-x = on the various chars (especially on two similar chars SM> displayed with different fonts), you'll see that they come from different SM> charsets (one is probably something like iso-8859-5 and the other may be SM> unicode). Emacs-22 doesn't unify them by default. You can try to put SM> (unify-8859-on-decoding-mode 1) in your .emacs. And you can also try to SM> play with utf-fragment-on-decoding. And ask someone more knowledgeable SM> about such problems. On first character like latin T: character: (01212102, 332866, 0x51442)-A charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.) code point: 40 66 syntax: word category: y:Cyrillic buffer code: 0x9C 0xF4 0xA8 0xC2 file code: 0xD0 0xA2 (encoded by coding system mule-utf-8) font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso10646-1 After the same character in the next line: character: (0151664, 54196, 0xd3b4) charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87) code point: 39 52 syntax: word category: Y:Cyrillic characters of 2-byte character sets j:Japanese |:While filling, we can break a line at this character. buffer code: 0x92 0xA7 0xB4 file code: not encodable by coding system mule-utf-8 font: -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-140-JISX0208.1983-0 Something is not ok here... SM> You could even M-x report-emacs-bug about it, since maybe the default config SM> in a cyrillic locale should already take care of it. SM> >>>> Cyrillic nput in emacs -nw in xterm still does not work, if I just >>>> change X keyboard layout. SM> SM> That doesn't give us much to go on, does it? What does it do, other than SM> "not work"? SM> >> It beeps. SM> SM> What does C-h l show after hitting a particular key? M-P M-0 C-h l -- Alexander Kotelnikov Saint-Petersburg, Russia