From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: James Cloos Newsgroups: gmane.emacs.devel Subject: X11 Compound Text vs ISO 2022 Date: Tue, 06 Jul 2010 12:21:11 -0400 Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1278433530 22970 80.91.229.12 (6 Jul 2010 16:25:30 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 6 Jul 2010 16:25:30 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jul 06 18:25:29 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OWAxU-0005dN-GJ for ged-emacs-devel@m.gmane.org; Tue, 06 Jul 2010 18:25:28 +0200 Original-Received: from localhost ([127.0.0.1]:42336 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OWAxR-0007Su-VR for ged-emacs-devel@m.gmane.org; Tue, 06 Jul 2010 12:25:18 -0400 Original-Received: from [140.186.70.92] (port=56733 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OWAuj-00062j-Cw for emacs-devel@gnu.org; Tue, 06 Jul 2010 12:22:32 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OWAua-0008N7-Hp for emacs-devel@gnu.org; Tue, 06 Jul 2010 12:22:22 -0400 Original-Received: from eagle.jhcloos.com ([207.210.242.212]:38410) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OWAua-0008Mk-Eu for emacs-devel@gnu.org; Tue, 06 Jul 2010 12:22:20 -0400 Original-Received: by eagle.jhcloos.com (Postfix, from userid 10) id B84434016D; Tue, 6 Jul 2010 16:21:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jhcloos.com; s=eagle; t=1278433337; bh=3i/zJy6eT73emzx9US0KnfihLc9VB30jH4L8H7CothI=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type: Content-Transfer-Encoding; b=A41NlbNt058wGATBJ3WJhkEeg3Up0f663zPq3yP+m7Ui33xrI2LuyKhejK59mi5Sq lU6g4r8U32tPglDjeXib+cBQXVyLHh+JsTfFQfL1fd1A54Fht0MxCjh0xG8U8qpM33 +rwTznWyj5xzFRhWKX9OIwuUwWGgEvgiBxOMSPec= Original-Received: from carbon.jhcloos.org (localhost [127.0.0.1]) by carbon.jhcloos.org (Postfix) with ESMTP id 6C9BD1C81FD for ; Tue, 6 Jul 2010 16:21:15 +0000 (UTC) User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Face: iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABHNCSVQICAgIfAhkiAAAAI1J REFUOE+lU9ESgCAIg64P1y+ngUdxhl5H8wFbbM0OmUiEhKkCYaZThXCo6KE5sCbA1DDX3genvO4d eBQgEMaM5qy6uWk4SfBYfdu9jvBN9nSVDOKRtwb+I3epboOsOX5pZbJNsBJFvmQQ05YMfieIBnYX FK2N6dOawd97r/e8RjkTLzmMsiVgrAoEugtviCM3v2WzjgAAAABJRU5ErkJggg== Copyright: Copyright 2009 James Cloos OpenPGP: ED7DAEA6; url=http://jhcloos.com/public_key/0xED7DAEA6.asc OpenPGP-Fingerprint: E9E9 F828 61A4 6EA9 0F2B 63E7 997A 9F17 ED7D AEA6 Original-Lines: 43 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:126837 Archived-At: While testing my recently applied patch, I've discovered that Emacs will product ISO-2022 output for COMPOUND_TEXT which other libs and apps -- notably including libX11 -- cannot decode. As an example, (encode-coding-string "=E2=80=A2" 'compound-text) ; U+2022= BULLET produces "^[$(O#@^[(B". '$(O' is ISO-IR 228=C2=B9, JIS X 2013:2000. But libX11 only knows about the $( charsets: 0, 1, A-D and G-M. A number of characters are output in '^[$-1'; such as: (encode-coding-string "=E2=84=9C" 'compound-text) ; U+211C BLACK-LETTER C= APITAL R "^[$-1\365\334^[-A" (encode-coding-string "=CA=BB" 'compound-text) ; U+02BB MODIFIER LETTER T= URNED COMMA "^[$-1\244\333^[-A" That is encoded in mule-unicode-0100-24ff, essentially unknown outside Emacs. Other libs/apps prefer to use utf-8=C2=B3 in compound_text for such chars= . I understand *why* this happens, given that Emacs used to use 2022 internally, but it confuses other X11 apps. I am not fully fluent in Emacs' internal charset conversion routines; is there an easy way to tell it to limit which 2022 charsets it will use when converting a string into a 2022 encoding? A better way? I will be adding at least some of the charsets to libX11, provided I can find the relevant mappings with X11-compatable licensing, but that will not help current installations, nor those who, like Emacs, rolled their own compund_text decoders. -JimC P.S. The libX11 src, in libX11/src/xlibi18n/lcCT.c, is the best resource to know which 2022 charsets libX11 supports. 1] http://www.itscj.ipsj.or.jp/ISO-IR/228.pdf 2] http://www.itscj.ipsj.or.jp/ISO-IR/143.pdf 3] http://www.itscj.ipsj.or.jp/ISO-IR/196.pdf --=20 James Cloos OpenPGP: 1024D/ED7DAEA6