From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: X11 Compound Text vs ISO 2022 Date: Thu, 29 Jul 2010 21:36:31 +0900 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1280407018 17204 80.91.229.12 (29 Jul 2010 12:36:58 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 29 Jul 2010 12:36:58 +0000 (UTC) Cc: david@harpegolden.net, emacs-devel@gnu.org To: James Cloos Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Jul 29 14:36:54 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OeSM1-0005sK-Ih for ged-emacs-devel@m.gmane.org; Thu, 29 Jul 2010 14:36:54 +0200 Original-Received: from localhost ([127.0.0.1]:46649 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OeSM0-0006ha-Kr for ged-emacs-devel@m.gmane.org; Thu, 29 Jul 2010 08:36:52 -0400 Original-Received: from [140.186.70.92] (port=52116 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OeSLs-0006fX-3c for emacs-devel@gnu.org; Thu, 29 Jul 2010 08:36:47 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OeSLm-0007r7-Nz for emacs-devel@gnu.org; Thu, 29 Jul 2010 08:36:43 -0400 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:64206) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OeSLm-0007pd-6U for emacs-devel@gnu.org; Thu, 29 Jul 2010 08:36:38 -0400 Original-Received: from rqsmtp2.aist.go.jp (rqsmtp2.aist.go.jp [150.29.254.123]) by mx1.aist.go.jp with ESMTP id o6TCaW0Y002922; Thu, 29 Jul 2010 21:36:32 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp4.aist.go.jp by rqsmtp2.aist.go.jp with ESMTP id o6TCaWht012656; Thu, 29 Jul 2010 21:36:32 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp4.aist.go.jp with ESMTP id o6TCaV6d020485; Thu, 29 Jul 2010 21:36:31 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken with local (Exim 4.71) (envelope-from ) id 1OeSLf-0002lF-EX; Thu, 29 Jul 2010 21:36:31 +0900 In-Reply-To: (message from James Cloos on Tue, 06 Jul 2010 12:21:11 -0400) X-detected-operating-system: by eggs.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:127959 Archived-At: Very sorry for the late response on this matter. In article , James Cloos writes: > While testing my recently applied patch, I've discovered that Emacs will > product ISO-2022 output for COMPOUND_TEXT which other libs and apps -- > notably including libX11 -- cannot decode. > As an example, (encode-coding-string "=E2=80=A2" 'compound-text) ; U+2022= BULLET > produces "^[$(O#@^[(B". '$(O' is ISO-IR 228=C2=B9, JIS X 2013:2000. But > libX11 only knows about the $( charsets: 0, 1, A-D and G-M. > A number of characters are output in '^[$-1'; such as: > (encode-coding-string "=E2=84=9C" 'compound-text) ; U+211C BLACK-LETTER C= APITAL R > "^[$-1\365\334^[-A" > (encode-coding-string "=CA=BB" 'compound-text) ; U+02BB MODIFIER LETTER T= URNED COMMA > "^[$-1\244\333^[-A" > That is encoded in mule-unicode-0100-24ff, essentially unknown outside > Emacs. I admit that those behaviour is not good now. When I at first implemented ctext in Emacs, there wasn't UTF8_STRING nor CTEXT_with_UTF8_extended_segment. So, I added more character sets to it for cut&paste between two running Emacses. As Emacs was the only application that supported many character sets at that time, no one complained about that behaviour of ctext. The other applications anyway couldn't handle those many characters. > Other libs/apps prefer to use utf-8=C2=B3 in compound_text for such chars. > I understand *why* this happens, given that Emacs used to use 2022 > internally, but it confuses other X11 apps. Actually the latest Emacs (Emacs 23 and the later) uses unicode internally. > I am not fully fluent in Emacs' internal charset conversion routines; > is there an easy way to tell it to limit which 2022 charsets it will > use when converting a string into a 2022 encoding? A better way? It's fairly easy to limit charsets of ctext. But, I care the backward compatibility. As ctext is the only coding system that is compatible with iso-8859-1 and can encode many other character sets, there will be old users who still uses it for file/process encodings. And, anyway ctext is not used for selection, I'd rather just document that ctext is not fully compatible X's COMPOUND_TEXT spec, but is the extended vesion. For WM_NAME, etc, yes, we should use ctext-with-extensions, and as ctext-with-extensions is not intended to be used directly by users, I think it won't cause actual problems even if we change it so that more characters are encoded using UTF8-extended-segment. So, I'll work on it soon. The only problem with ctext-with-extensions is that it is now implemented by Elisp, and thus it may cause GC. I'm not sure it is safe to call Lisp at the place we convert WM_NAME etc. If it is not safe, I'll implement ctext-with-extensions in C. --- Kenichi Handa handa@m17n.org