all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: James Cloos <cloos@jhcloos.com>
Cc: david@harpegolden.net, emacs-devel@gnu.org
Subject: Re: X11 Compound Text vs ISO 2022
Date: Thu, 29 Jul 2010 21:36:31 +0900	[thread overview]
Message-ID: <tl7wrsesbw0.fsf@m17n.org> (raw)
In-Reply-To: <m3zky4wpgw.fsf@carbon.jhcloos.org> (message from James Cloos on Tue, 06 Jul 2010 12:21:11 -0400)

Very sorry for the late response on this matter.

In article <m3zky4wpgw.fsf@carbon.jhcloos.org>, James Cloos <cloos@jhcloos.com> writes:

> While testing my recently applied patch, I've discovered that Emacs will
> product ISO-2022 output for COMPOUND_TEXT which other libs and apps --
> notably including libX11 -- cannot decode.

> As an example, (encode-coding-string "•" 'compound-text) ; U+2022 BULLET
> produces "^[$(O#@^[(B".  '$(O' is ISO-IR 228¹, JIS X 2013:2000.  But
> libX11 only knows about the $( charsets:  0, 1, A-D and G-M.

> A number of characters are output in '^[$-1'; such as:

> (encode-coding-string "ℜ" 'compound-text) ; U+211C BLACK-LETTER CAPITAL R
> "^[$-1\365\334^[-A"
> (encode-coding-string "ʻ" 'compound-text) ; U+02BB MODIFIER LETTER TURNED COMMA
> "^[$-1\244\333^[-A"

> That is encoded in mule-unicode-0100-24ff, essentially unknown outside
> Emacs.

I admit that those behaviour is not good now.  When I at
first implemented ctext in Emacs, there wasn't UTF8_STRING
nor CTEXT_with_UTF8_extended_segment.  So, I added more
character sets to it for cut&paste between two running
Emacses.  As Emacs was the only application that supported
many character sets at that time, no one complained about
that behaviour of ctext.  The other applications anyway
couldn't handle those many characters.

> Other libs/apps prefer to use utf-8³ in compound_text for such chars.

> I understand *why* this happens, given that Emacs used to use 2022
> internally, but it confuses other X11 apps.

Actually the latest Emacs (Emacs 23 and the later) uses
unicode internally.

> I am not fully fluent in Emacs' internal charset conversion routines;
> is there an easy way to tell it to limit which 2022 charsets it will
> use when converting a string into a 2022 encoding?  A better way?

It's fairly easy to limit charsets of ctext.  But, I care
the backward compatibility.  As ctext is the only coding
system that is compatible with iso-8859-1 and can encode
many other character sets, there will be old users who still
uses it for file/process encodings.

And, anyway ctext is not used for selection, I'd rather just
document that ctext is not fully compatible X's
COMPOUND_TEXT spec, but is the extended vesion.

For WM_NAME, etc, yes, we should use ctext-with-extensions,
and as ctext-with-extensions is not intended to be used
directly by users, I think it won't cause actual problems
even if we change it so that more characters are encoded
using UTF8-extended-segment.  So, I'll work on it soon.

The only problem with ctext-with-extensions is that it is
now implemented by Elisp, and thus it may cause GC.  I'm not
sure it is safe to call Lisp at the place we convert WM_NAME
etc.  If it is not safe, I'll implement
ctext-with-extensions in C.

---
Kenichi Handa
handa@m17n.org



  parent reply	other threads:[~2010-07-29 12:36 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-06 16:21 X11 Compound Text vs ISO 2022 James Cloos
2010-07-06 20:18 ` David De La Harpe Golden
2010-07-06 22:30   ` James Cloos
2010-07-07  0:36     ` Stephen J. Turnbull
2010-07-07  5:19       ` James Cloos
2010-07-07 19:51         ` James Cloos
2010-07-08  0:24           ` David De La Harpe Golden
2010-07-14 21:07             ` James Cloos
2010-07-06 23:38 ` David De La Harpe Golden
2010-07-07  1:15   ` David De La Harpe Golden
2010-07-07  4:55   ` James Cloos
2010-07-29 12:36 ` Kenichi Handa [this message]
2010-07-29 15:51   ` James Cloos
2010-07-30  1:27     ` Kenichi Handa
2010-07-30 18:46       ` James Cloos
2010-08-01  9:35         ` Stephen J. Turnbull
2010-08-01 11:06           ` James Cloos
2010-08-02  8:14             ` Stephen J. Turnbull
2010-08-06 12:50             ` Kenichi Handa
2010-08-08  9:47               ` James Cloos
2010-08-09  1:49                 ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl7wrsesbw0.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=cloos@jhcloos.com \
    --cc=david@harpegolden.net \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.