From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Sat, 27 Sep 2014 12:32:56 +0300 Message-ID: <837g0ptnlj.fsf@gnu.org> References: <54193A70.9020901@member.fsf.org> <87lhp6h4zb.fsf@panthera.terpri.org> <87k34qo4c1.fsf@fencepost.gnu.org> <54257C22.2000806@yandex.ru> <83iokato6x.fsf@gnu.org> <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1411810426 18310 80.91.229.3 (27 Sep 2014 09:33:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 27 Sep 2014 09:33:46 +0000 (UTC) Cc: dmantipov@yandex.ru, dak@gnu.org, emacs-devel@gnu.org To: "Stephen J. Turnbull" , Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Sep 27 11:33:37 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XXoNk-0006bB-2n for ged-emacs-devel@m.gmane.org; Sat, 27 Sep 2014 11:33:36 +0200 Original-Received: from localhost ([::1]:55145 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XXoNj-0003kq-AS for ged-emacs-devel@m.gmane.org; Sat, 27 Sep 2014 05:33:35 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:60620) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XXoNb-0003jf-UV for emacs-devel@gnu.org; Sat, 27 Sep 2014 05:33:32 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XXoNX-00065P-2C for emacs-devel@gnu.org; Sat, 27 Sep 2014 05:33:27 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:60116) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XXoNR-00063j-Ln; Sat, 27 Sep 2014 05:33:17 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NCJ00L00ZQ4UU00@a-mtaout22.012.net.il>; Sat, 27 Sep 2014 12:33:11 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NCJ00LS0ZVAUC00@a-mtaout22.012.net.il>; Sat, 27 Sep 2014 12:33:11 +0300 (IDT) In-reply-to: <87wq8pwjen.fsf@uwakimon.sk.tsukuba.ac.jp> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:174733 Archived-At: > From: "Stephen J. Turnbull" > Date: Sat, 27 Sep 2014 17:35:12 +0900 > Cc: Dmitry Antipov , dak@gnu.org, emacs-devel@gnu.org > > Eli Zaretskii writes: > > > Date: Fri, 26 Sep 2014 18:45:54 +0400 > > > From: Dmitry Antipov > > > Cc: emacs-devel@gnu.org > > > > > > Why not just use ICU? > > > > Emacs needs to be able to extend the Unicode code-point space for raw > > 8-bit bytes and for a couple of character sets that are not unified. > > No, you don't. There's plenty of private space for those purposes > (unless you know of private character sets that use more than two > whole planes?) I take it that you have studied the charsets for which we use codepoints above 0x10FFFF, and concluded that they all fit in the 2*64K+6.4K PUA space provided by Unicode? We have several quite large character sets which need that (grep mule-conf.el for ":unify-map" to see the list, and see etc/charsets/ for the map files). I'm not sure the PUA space is large enough, but I didn't sum all the numbers. In any case, the question why we don't use PUA for this is best addressed to Handa-san (CC'ed). > Emacs would simply use an indirect representation for > private space. (That is, code points in private space are not > necessarily identical to the input code points, but rather are indexes > into an auxiliary table which implements the disjoint sum of the > private code spaces in use.) IIUC, this is a non-trivial complication. Currently, our mapping is set up so that we can keep the non-unified characters in our buffers, while you propose indirection via tables. This means, for example, that direct access to char-tables will become slower. > Since this is private space, you need to build a table of attributes > for these characters (I/O representation, UCD properties, glyphs, etc) > anyway. For Unicode input using private space, you just record that > as the I/O representation. Yes, and the question is how well does ICU support setting up these. I don't know the answer to that. It is also not clear to me whether what you suggest will support the internal representation of raw bytes and their conversion to and from their external (a.k.a. "encoded") 8-bit values. In any case, I agree that using ICU in Guile would be a huge step forward, because currently they simply rely on the underlying libc, which is only a more-or-less safe bet when libc is glibc; if not, the results fall very short of what the user needs and Emacs expects.