From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: idn.el and confusables.txt Date: Sun, 15 May 2011 20:34:55 +0300 Message-ID: <83wrhreukg.fsf@gnu.org> References: <87ei40un8w.fsf@m17n.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1305480921 22440 80.91.229.12 (15 May 2011 17:35:21 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 15 May 2011 17:35:21 +0000 (UTC) Cc: tzz@lifelogs.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun May 15 19:35:16 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QLfDj-0007iV-PW for ged-emacs-devel@m.gmane.org; Sun, 15 May 2011 19:35:11 +0200 Original-Received: from localhost ([::1]:37053 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLfDi-0001Ms-RE for ged-emacs-devel@m.gmane.org; Sun, 15 May 2011 13:35:10 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:51164) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLfDg-0001LU-CV for emacs-devel@gnu.org; Sun, 15 May 2011 13:35:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLfDb-0000ZH-FS for emacs-devel@gnu.org; Sun, 15 May 2011 13:35:08 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:41154) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLfDb-0000Xh-7i for emacs-devel@gnu.org; Sun, 15 May 2011 13:35:03 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0LL800700ZB7N000@a-mtaout23.012.net.il> for emacs-devel@gnu.org; Sun, 15 May 2011 20:35:01 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([77.124.10.122]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LL8007TQZIA7X90@a-mtaout23.012.net.il>; Sun, 15 May 2011 20:35:01 +0300 (IDT) In-reply-to: <87ei40un8w.fsf@m17n.org> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 80.179.55.175 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:139419 Archived-At: > From: Kenichi Handa > Cc: monnier@iro.umontreal.ca, tzz@lifelogs.com, emacs-devel@gnu.org > Date: Sun, 15 May 2011 22:06:23 +0900 > > In article <83iptdg0yr.fsf@gnu.org>, Eli Zaretskii writes: > > > You see, the uni-*.el files we create out of the Unicode DB are not > > used anywhere in application code, AFAIK. We use them to display > > character properties in the likes of "C-u C-x =", and that's it. > > composite.el uses `general-category' and `canonical-combining-class'. > ucs-normalize.el uses `decomposition' and `canonical-combining-class'. > mule-cmds.el uses `name' and `old-name' for read-char-by-name. Are functions defined by ucs-normalize.el used anywhere? > Why did you have to create another table? Was it because > get-char-code-property is defiend by Lisp and not efficient > to call from C? Yes, calling a Lisp function (one that calls `load' at that!) in the lowest level of display engine was out of the question. But there were several other reasons as well: . get-char-code-property returns a property list in which bidi types are recorded as symbols, while I needed them as small numeric values of a C enumerated type (see bidi_type_t), to fit in a small number of bits in `struct glyph'. . The data structures manipulated by get-char-code-property include complications (e.g., a function in the extra slot) for which I could find no documentation, so I couldn't figure out whether it would be possible to replace get-char-code-property by a simple call to CHAR_TABLE_REF. . Even if I could use CHAR_TABLE_REF, the additional call to plist-get means more overhead. bidi_get_type, the function which needs to look up the bidirectional type of an arbitrary character, runs in the innermost loop of the display engine, and is called at least once (sometimes more) for every character in the displayed portion of the buffer, so it must be very efficient. . For bidi-mirrored property, the data in the `mirrored' property recorded by uni-mirrored.el is simply inadequate: the value is a boolean (albeit in a form of symbols `Y' and `N'). What I needed was for each character its mirrored character, if there is one; this data was simply not available in uni-mirrored.el. The corresponding function bidi_mirror_char is also called for a large percentage of displayed characters, and must be efficient. It was extremely frustrating to have all that data at my fingertips and not be able to use it for the purposes of bidi.c, which at first seems like a first-class client of Unicode DB. What I wanted was something similar to C ctype macros in simplicity and efficiency, but nothing quite like that was available. A char-table comes close, but it must be a simple table with numerical values -- and that is what bidi.c currently uses, leaving uni-bidi.el unused.