From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: idn.el and confusables.txt Date: Sat, 14 May 2011 10:30:37 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87aaepi9k2.fsf@lifelogs.com> References: <874o5uie42.fsf@lifelogs.com> <87y635dll9.fsf@lifelogs.com> <87r58vbj7o.fsf@lifelogs.com> <87fwpba03q.fsf@lifelogs.com> <874o5rqr5z.fsf@lifelogs.com> <87mxjjpal4.fsf@lifelogs.com> <87vcy6nzan.fsf@lifelogs.com> <87tydl4sjj.fsf_-_@lifelogs.com> <87r58pghh7.fsf_-_@lifelogs.com> <83iptdg0yr.fsf@gnu.org> <87y629ien3.fsf@lifelogs.com> <83aaepfiuk.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: dough.gmane.org 1305387067 26773 80.91.229.12 (14 May 2011 15:31:07 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 14 May 2011 15:31:07 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat May 14 17:31:03 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QLGo2-0007my-3x for ged-emacs-devel@m.gmane.org; Sat, 14 May 2011 17:31:02 +0200 Original-Received: from localhost ([::1]:48172 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLGo1-0006EK-Ea for ged-emacs-devel@m.gmane.org; Sat, 14 May 2011 11:31:01 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:51731) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLGnx-0006E4-DV for emacs-devel@gnu.org; Sat, 14 May 2011 11:30:58 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLGnw-0001am-6w for emacs-devel@gnu.org; Sat, 14 May 2011 11:30:57 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:48779) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLGnv-0001aR-SY for emacs-devel@gnu.org; Sat, 14 May 2011 11:30:56 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QLGnt-0007is-F4 for emacs-devel@gnu.org; Sat, 14 May 2011 17:30:53 +0200 Original-Received: from c-67-186-102-106.hsd1.il.comcast.net ([67.186.102.106]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 14 May 2011 17:30:53 +0200 Original-Received: from tzz by c-67-186-102-106.hsd1.il.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 14 May 2011 17:30:53 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 59 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: c-67-186-102-106.hsd1.il.comcast.net X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:YnAcIOODhw7htzi083fmG6/DjYQ= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:139399 Archived-At: On Sat, 14 May 2011 17:38:11 +0300 Eli Zaretskii wrote: >> From: Ted Zlatanov >> Date: Sat, 14 May 2011 08:40:48 -0500 >> EZ> You see, the uni-*.el files we create out of the Unicode DB are not EZ> used anywhere in application code, AFAIK. We use them to display EZ> character properties in the likes of "C-u C-x =", and that's it. I'm EZ> not even sure they are organized in a way that makes them useful. >> >> markchars.el could use other Unicode properties if people ask. EZ> I'm talking about the details. The way we currently set the tables in EZ> uni-*.el is that many of the values are symbols. For example: EZ> (get-char-code-property ?1 'general-category) => Nd EZ> (get-char-code-property ?א 'bidi-class) => R EZ> (get-char-code-property ?\( 'mirrored) => Y EZ> The `Nd', `R', and `Y' are symbols. EZ> Now, suppose you wanted to use these values in some code that needs to EZ> be fast -- how would you feel about having to write multi-branch EZ> `cond' forms to compare the value against all the possibilities? It wouldn't be ideal, surely, but most glyphs are not confusable so the lookup would fail. I might write some of it in C if performance was an issue, or try to inline the conditions with macros, or cache the lookups. But I don't know if markchars.el needs to be terribly fast. It runs at the font-lock level and IIUC that's opportunistic and not time-critical like the display code. For instance, unmodified text is not rechecked, right? EZ> For now, with markchars.el, all you need is a boolean value for each EZ> character. However, in other use cases, some other Lisp code will EZ> want the paired character. Yet another application will want to EZ> compare characters such that confusable pairs will compare equal. Can EZ> a single table satisfy all these needs efficiently? Maybe it can, but EZ> we need to design that table carefully. Two char-tables would be enough: one small table for the confusable -> target mapping, and one even smaller for the reverse target -> (confusable list) mapping. The reverse lookup table could be stored in an extra slot of the primary lookup table. markchars.el could use this mapping to show more information than just underlining the characters. A tooltip could show why the glyph is confusable, for instance. >> Also the char-table doesn't have to >> cover the Asian confusables--I'm not sure anyone would need those. EZ> Well, the Unicode consortium definitely thought they were needed. EZ> Either we follow established standards, or we don't. You're right. Also there are Asian characters that could be confused for Latin characters so it's not safe to exclude them. Ted