From: Alexander Kotelnikov <sacha@myxomop.com>
Subject: Re: converting between charsets
Date: Sat, 13 May 2006 22:42:04 +0400 [thread overview]
Message-ID: <84bqu17on7.fsf@vinci.loc> (raw)
In-Reply-To: 87k68v82q6.fsf-monnier+emacs@gnu.org
>>>>> On Tue, 09 May 2006 14:42:01 -0400
>>>>> "SM" == Stefan Monnier <monnier@iro.umontreal.ca> wrote:
SM>
>> I started this thread from note about problems with
>> encoding-coding-region:
SM>
>>>>> On Sun, 07 May 2006 13:52:08 +0400
>>>>> "AK" == Alexander Kotelnikov <sacha@myxomop.com> wrote:
AK>
AK> There could be three different ways, which I checked, how characters
AK> to be converted can appear in emacs buffer:
AK> a. when I open such file.
AK> b. when I type in characters and my keyboard layout in X is different
AK> from 'us', for me it is normally 'ru' then.
AK> c. when I type in after I used toggle-input-method.
AK>
AK>
AK> And the trouble is that encode-coding-region converts only in case
AK> (c). In (a) and (b) characters that need conversion are substituted
AK> with question marks. And even in (c) conversion is performed (if, for
AK> instance, I save a file after it appears to be in koi8-r) in the
AK> converted buffer converted characters are shown in \321 manner.
AK>
AK> So, it will be nice to get some help on this, thanks.
SM>
SM> Please explain why you think there is relation between those things and
SM> encode-coding-region. And of course, that will involve describing where how
SM> and when you call encode-coding-region.
I do not understand the question. I use encode-coding-region to encode
a region into a charset and some characters are not encoded, but are
substituted with question mark.
SM> Oh, I see. I don't know enough of how this works to help you much further.
SM> If you hit C-u C-x = on the various chars (especially on two similar chars
SM> displayed with different fonts), you'll see that they come from different
SM> charsets (one is probably something like iso-8859-5 and the other may be
SM> unicode). Emacs-22 doesn't unify them by default. You can try to put
SM> (unify-8859-on-decoding-mode 1) in your .emacs. And you can also try to
SM> play with utf-fragment-on-decoding. And ask someone more knowledgeable
SM> about such problems.
SM>
>> On first character like latin T:
>> character: <I removed cyrillic character> (01212102, 332866, 0x51442)^[-A
>> charset: mule-unicode-0100-24ff
>> (Unicode characters of the range U+0100..U+24FF.)
>> code point: 40 66
>> syntax: word
>> category: y:Cyrillic
>> buffer code: 0x9C 0xF4 0xA8 0xC2
>> file code: 0xD0 0xA2 (encoded by coding system mule-utf-8)
>> font: -monotype-courier new-medium-r-normal--13-94-99-99-m-80-iso10646-1
SM>
>> After the same character in the next line:
>> character: <I remove cyrillic character shown with wrong fontt> (0151664, 54196, 0xd3b4)
>> charset: japanese-jisx0208 (JISX0208.1983/1990 Japanese Kanji: ISO-IR-87)
>> code point: 39 52
>> syntax: word
>> category: Y:Cyrillic characters of 2-byte character sets j:Japanese
>> |:While filling, we can break a line at this character.
>> buffer code: 0x92 0xA7 0xB4
>> file code: not encodable by coding system mule-utf-8
>> font: -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-140-JISX0208.1983-0
SM>
>> Something is not ok here...
SM>
SM> Same kind of issue as the one I mentioned.
SM> Have you tried unify-8859-on-decoding-mode?
Just tried. Nothing changes.
SM> In any case, please report this via M-x report-emacs-bug with as many
SM> painful details as you can come up with (i.e. describe how to reproduce the
SM> problem starting from "emacs -Q", showing your locale, etc...).
SM>
SM> You could even M-x report-emacs-bug about it, since maybe the default config
SM> in a cyrillic locale should already take care of it.
SM>
>>>>> Cyrillic nput in emacs -nw in xterm still does not work, if I just
>>>>> change X keyboard layout.
SM>
SM> That doesn't give us much to go on, does it? What does it do, other than
SM> "not work"?
SM>
>>>> It beeps.
SM>
SM> What does C-h l show after hitting a particular key?
SM>
>> M-P M-0 C-h l
SM>
SM> So when you hit that key, Emacs received M-P M-0 rather than the char you
SM> think you sent to it. What is your locale?
22:37 pts/28 sacha@vinci:~ 1> locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES=C
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
--
Alexander Kotelnikov
Saint-Petersburg, Russia
next prev parent reply other threads:[~2006-05-13 18:42 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-05-07 9:52 converting between charsets Alexander Kotelnikov
2006-05-07 12:43 ` Stefan Monnier
2006-05-07 19:40 ` Alexander Kotelnikov
2006-05-08 3:28 ` Stefan Monnier
2006-05-08 9:39 ` Alexander Kotelnikov
2006-05-08 14:30 ` Stefan Monnier
2006-05-09 5:41 ` Alexander Kotelnikov
2006-05-09 18:42 ` Stefan Monnier
2006-05-13 18:42 ` Alexander Kotelnikov [this message]
2006-05-14 3:20 ` Stefan Monnier
2006-05-14 17:53 ` Alexander Kotelnikov
2006-05-15 0:37 ` Stefan Monnier
2006-05-15 5:55 ` Alexander Kotelnikov
2006-05-15 6:02 ` Alexander Kotelnikov
2006-05-15 14:11 ` Stefan Monnier
2006-05-15 20:30 ` Alexander Kotelnikov
2006-05-16 3:50 ` Stefan Monnier
2006-05-16 10:04 ` Alexander Kotelnikov
2006-05-17 15:20 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84bqu17on7.fsf@vinci.loc \
--to=sacha@myxomop.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.