From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: idn.el and confusables.txt Date: Sat, 14 May 2011 12:06:04 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <8739khi54z.fsf@lifelogs.com> References: <87y635dll9.fsf@lifelogs.com> <87r58vbj7o.fsf@lifelogs.com> <87fwpba03q.fsf@lifelogs.com> <874o5rqr5z.fsf@lifelogs.com> <87mxjjpal4.fsf@lifelogs.com> <87vcy6nzan.fsf@lifelogs.com> <87tydl4sjj.fsf_-_@lifelogs.com> <87r58pghh7.fsf_-_@lifelogs.com> <83iptdg0yr.fsf@gnu.org> <87y629ien3.fsf@lifelogs.com> <83aaepfiuk.fsf@gnu.org> <87aaepi9k2.fsf@lifelogs.com> <834o4xfd34.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1305392789 22620 80.91.229.12 (14 May 2011 17:06:29 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 14 May 2011 17:06:29 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat May 14 19:06:24 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QLIIK-0001nl-Bd for ged-emacs-devel@m.gmane.org; Sat, 14 May 2011 19:06:24 +0200 Original-Received: from localhost ([::1]:53591 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLIIJ-0004u0-Na for ged-emacs-devel@m.gmane.org; Sat, 14 May 2011 13:06:23 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:35811) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLIIG-0004tv-Mr for emacs-devel@gnu.org; Sat, 14 May 2011 13:06:21 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLIIF-0001VE-Kp for emacs-devel@gnu.org; Sat, 14 May 2011 13:06:20 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:58364) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLIIF-0001VA-8y for emacs-devel@gnu.org; Sat, 14 May 2011 13:06:19 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QLIID-0001jP-Ti for emacs-devel@gnu.org; Sat, 14 May 2011 19:06:17 +0200 Original-Received: from c-67-186-102-106.hsd1.il.comcast.net ([67.186.102.106]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 14 May 2011 19:06:17 +0200 Original-Received: from tzz by c-67-186-102-106.hsd1.il.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 14 May 2011 19:06:17 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 45 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: c-67-186-102-106.hsd1.il.comcast.net X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:RVPC3FL2zWlWa3PlECcHcj3Mp3M= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:139401 Archived-At: On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii wrote: EZ> Isn't it better to design the table for efficient use to begin with? Yes, and I ask you and the other experts on char-tables to help with that design. I am far from an expert on that topic. >> But I don't know if markchars.el needs to be terribly fast. EZ> I hope we are not introducing another character property for a EZ> single use. Some use, some day might need to do it fast. This is premature optimization. I only have a single use in hand. Let's make sure markchars.el is fast and we can optimize for other uses when they are needed. >> Two char-tables would be enough: one small table for the confusable -> >> target mapping, and one even smaller for the reverse target -> >> (confusable list) mapping. The reverse lookup table could be stored in >> an extra slot of the primary lookup table. EZ> Doesn't confusables.txt include both mappings already? If so, you EZ> don't need the reverse table. I thought the lookups would be faster with a reverse mapping in one of the scenarios you listed (looking up all the characters that might be confused with a given one). But I realized it doesn't need to be. Let's say C1, C2, and C3 are confusables mapped to C1. Then the mapping is C1 -> (C2, C3); C2 -> C1; and C3 -> C1. The algorithm is "if a character maps to an atom it's confusable with it, if it maps to a list the whole lisp is confusable to this character." So to find all the confusables mapped to a character you need at most two lookups. In addition to the character mapping we also need a confusable data type, which can be SL/SA (single-script) or ML/MA (mixed-script). I don't know where to store that. Maybe we can just have two char-tables for the two data types. There aren't going to be more data types AFAIK. But markchars.el can definitely use the knowledge that the confusable is within a single script or not. Does all of that make sense? Ted