From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: highlighting non-ASCII characters Date: Fri, 26 Mar 2010 12:35:36 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87pr2rj89j.fsf@lifelogs.com> References: <8739zryv6l.fsf_-_@lifelogs.com> <6932BBFEB09A4BA09156ED7F598569CE@us.oracle.com> <87pr2uv8e1.fsf@lifelogs.com> <87aatyuj9s.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1269632805 16601 80.91.229.12 (26 Mar 2010 19:46:45 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 26 Mar 2010 19:46:45 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 26 20:46:41 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NvFTG-0000qN-Oq for ged-emacs-devel@m.gmane.org; Fri, 26 Mar 2010 20:46:41 +0100 Original-Received: from localhost ([127.0.0.1]:41925 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NvFSW-0002Wx-0e for ged-emacs-devel@m.gmane.org; Fri, 26 Mar 2010 15:44:44 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NvDRq-0000Fl-Vl for emacs-devel@gnu.org; Fri, 26 Mar 2010 13:35:55 -0400 Original-Received: from [140.186.70.92] (port=41076 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NvDRp-0000El-9z for emacs-devel@gnu.org; Fri, 26 Mar 2010 13:35:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NvDRo-000303-3B for emacs-devel@gnu.org; Fri, 26 Mar 2010 13:35:53 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:50947) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NvDRn-0002zr-Pi for emacs-devel@gnu.org; Fri, 26 Mar 2010 13:35:52 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1NvDRl-0002DU-4D for emacs-devel@gnu.org; Fri, 26 Mar 2010 18:35:49 +0100 Original-Received: from 38.98.147.130 ([38.98.147.130]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 26 Mar 2010 18:35:49 +0100 Original-Received: from tzz by 38.98.147.130 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 26 Mar 2010 18:35:49 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 49 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 38.98.147.130 X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:eqWYXtcsbvwPw4Ahb7dszK9BNYE= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:122715 Archived-At: On Wed, 24 Mar 2010 20:34:41 +0100 Lennart Borgman wrote: LB> The attached file sets up IDN chars as above. How about defining a LB> character class [:idnchars:]? The IDN character class could be useful. The list changes so rarely that it can be hard-coded like the POSIX classes IMO. I think this would be done in src/regex.c by defining RECC_IDNCHARS for instance. This could highlight when non-IDN characters are used in a domain name. But IDN characters are separate from the "confusables" (homoglyphs) we should discuss, which are much more problematic and more complex because they not just a character class. On Thu, 25 Mar 2010 09:11:35 +0200 Juri Linkov wrote: JL> I think it would be more useful to implement this spec: JL> http://www.unicode.org/reports/tr39/data/confusables.txt JL> "Visually Confusable Characters: Provides a mapping for visual JL> confusables for use in further restricting identifiers for security". JL> It's very large, but it seems it's still incomplete. I can't find JL> a "confusable" mapping for the problem I reported: JL> BOX DRAWINGS DOUBLE HORIZONTAL -> EQUALS SIGN We can have a [:confusable:] character class defined in src/regex.c. That lets us find these characters. It could be generated from the TXT database and augmented with our own mappings. But there's grouping information, so maybe that should be available too. For highlighting we don't need grouping information, but the user would find it useful to look at a glyph and find out that it looks like 3 other glyphs. So this can be in a Lisp-level data structure like a hashtable with list values. I looked at whitespace.el and it looks generally suitable for this kind of highlighting. I can't decide if the work should augment whitespace.el or if it should be a new library called visible.el (because the name whitespace.el is so specific). On Thu, 25 Mar 2010 15:07:04 +0100 Lennart Borgman wrote: LB> To me it looks like IDN is the most important. Is not this a LB> derivative work from "confusables"? I think they are separate logically. TR39 cares about "confusables" in the context of IDN but Emacs has a wider view as a general text editor, IIUC. Ted