From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: idn.el and confusables.txt Date: Sat, 14 May 2011 17:38:11 +0300 Message-ID: <83aaepfiuk.fsf@gnu.org> References: <87k4erh6q3.fsf@lifelogs.com> <874o5uie42.fsf@lifelogs.com> <87y635dll9.fsf@lifelogs.com> <87r58vbj7o.fsf@lifelogs.com> <87fwpba03q.fsf@lifelogs.com> <874o5rqr5z.fsf@lifelogs.com> <87mxjjpal4.fsf@lifelogs.com> <87vcy6nzan.fsf@lifelogs.com> <87tydl4sjj.fsf_-_@lifelogs.com> <87r58pghh7.fsf_-_@lifelogs.com> <83iptdg0yr.fsf@gnu.org> <87y629ien3.fsf@lifelogs.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: dough.gmane.org 1305383922 11124 80.91.229.12 (14 May 2011 14:38:42 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 14 May 2011 14:38:42 +0000 (UTC) Cc: emacs-devel@gnu.org To: Ted Zlatanov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat May 14 16:38:38 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QLFzJ-0007MT-Ud for ged-emacs-devel@m.gmane.org; Sat, 14 May 2011 16:38:38 +0200 Original-Received: from localhost ([::1]:32877 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLFzJ-0007gk-4l for ged-emacs-devel@m.gmane.org; Sat, 14 May 2011 10:38:37 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:38921) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLFzG-0007gR-RN for emacs-devel@gnu.org; Sat, 14 May 2011 10:38:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLFzF-0000de-NV for emacs-devel@gnu.org; Sat, 14 May 2011 10:38:34 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:40732) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLFzF-0000dS-As for emacs-devel@gnu.org; Sat, 14 May 2011 10:38:33 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0LL600800VU4L800@a-mtaout22.012.net.il> for emacs-devel@gnu.org; Sat, 14 May 2011 17:38:18 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.124.10.122]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LL6008OAWNRGX30@a-mtaout22.012.net.il>; Sat, 14 May 2011 17:38:18 +0300 (IDT) In-reply-to: <87y629ien3.fsf@lifelogs.com> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.172 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:139398 Archived-At: > From: Ted Zlatanov > Date: Sat, 14 May 2011 08:40:48 -0500 >=20 > EZ> You see, the uni-*.el files we create out of the Unicode DB are= not > EZ> used anywhere in application code, AFAIK. We use them to displ= ay > EZ> character properties in the likes of "C-u C-x =3D", and that's = it. I'm > EZ> not even sure they are organized in a way that makes them usefu= l. >=20 > markchars.el could use other Unicode properties if people ask. I'm talking about the details. The way we currently set the tables i= n uni-*.el is that many of the values are symbols. For example: (get-char-code-property ?1 'general-category) =3D> Nd (get-char-code-property ?=D7=90 'bidi-class) =3D> R (get-char-code-property ?\( 'mirrored) =3D> Y The `Nd', `R', and `Y' are symbols. Now, suppose you wanted to use these values in some code that needs t= o be fast -- how would you feel about having to write multi-branch `cond' forms to compare the value against all the possibilities? For bidi reordering, which runs in the innermost loop of the display engine, using the `bidi-class' or `mirrored' properties that are symbols would be prohibitively expensive. For now, with markchars.el, all you need is a boolean value for each character. However, in other use cases, some other Lisp code will want the paired character. Yet another application will want to compare characters such that confusable pairs will compare equal. Ca= n a single table satisfy all these needs efficiently? Maybe it can, bu= t we need to design that table carefully. > But specifically regarding the ones I'm proposing for inclusion, > since we've started using the GNU ELPA more and markchars.el lives > in it, we can put uni-confusables.el and uni-idn.el in the GNU ELPA > instead of the Emacs trunk. I'm not arguing about where to put them. I'm saying that for such basic infrastructure, we should consider the possible uses before we rush into implementation. Otherwise, we will again repeat the same mistake, whose result is that the only real user of bidirectional properties cannot use uni-bidi.el! > EZ> So I'd really like to avoid introducing yet another huge table = whose > EZ> only effects are to show one more property in "C-u C-x =3D" and= bloat > EZ> the ELisp manual some more. >=20 > IMO it's not a huge table ??? It's a char-table that can be indexed by any character supported by Emacs. Even if you count only the characters mentioned in confusables.txt, there are 20 thousand of them. char-tables are memory-efficient, but their footprint is not negligible. The bloat may be insignificant by comparison, but if the _only_ usefu= l effect is the bloat, why should we do that? > Also the char-table doesn't have to > cover the Asian confusables--I'm not sure anyone would need those. Well, the Unicode consortium definitely thought they were needed. Either we follow established standards, or we don't.