From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: highlighting non-ASCII characters Date: Tue, 30 Mar 2010 08:22:06 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87r5n2gd1d.fsf@lifelogs.com> References: <87aatyuj9s.fsf@lifelogs.com> <87pr2rj89j.fsf@lifelogs.com> <87ljdeke5k.fsf@lifelogs.com> <87eij3ht2c.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1269955369 18614 80.91.229.12 (30 Mar 2010 13:22:49 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 30 Mar 2010 13:22:49 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Mar 30 15:22:42 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NwbOu-0005FQ-Aj for ged-emacs-devel@m.gmane.org; Tue, 30 Mar 2010 15:22:36 +0200 Original-Received: from localhost ([127.0.0.1]:55138 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwbOt-0003VN-Lw for ged-emacs-devel@m.gmane.org; Tue, 30 Mar 2010 09:22:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NwbOn-0003VI-Ez for emacs-devel@gnu.org; Tue, 30 Mar 2010 09:22:29 -0400 Original-Received: from [140.186.70.92] (port=42600 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwbOl-0003VA-7I for emacs-devel@gnu.org; Tue, 30 Mar 2010 09:22:28 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NwbOh-0006sz-4g for emacs-devel@gnu.org; Tue, 30 Mar 2010 09:22:27 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:54159) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwbOg-0006sd-OW for emacs-devel@gnu.org; Tue, 30 Mar 2010 09:22:23 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1NwbOe-00058y-J0 for emacs-devel@gnu.org; Tue, 30 Mar 2010 15:22:20 +0200 Original-Received: from 38.98.147.130 ([38.98.147.130]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 30 Mar 2010 15:22:20 +0200 Original-Received: from tzz by 38.98.147.130 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 30 Mar 2010 15:22:20 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 38 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 38.98.147.130 X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:jInPiNpqT3CEonOhfPQ2vMn9fYA= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:122909 Archived-At: On Mon, 29 Mar 2010 22:51:02 +0200 Lennart Borgman wrote: LB> However just hilighting non-IDN chars seems useful enough. I think it LB> should be done everywhere (because it is simple and probably does not LB> hurt, IDN seems mostly useful for variables to for examples), or LB> optionally only in strings (the only URL context we can actually LB> guess). LB> For the moment I have implemented this as fontification. Having it as LB> a char class that is flexibly initialized would be better. Perhaps my LB> routines for reading the chars can be used there too. Look at Categories in the ELisp manual (what Stefan referred to when he mentioned category-table). If you can implement your reader that way it would be great. It's much better than modifying regexp.c :) LB> The homoglyph context thing is maybe more difficult. I did not try to LB> read carefully so I do not know much. I guess there is something like LB> char value ranges to use, or? Someone knows which document that where LB> those ranges can be read (by some elisp code)? The confusables text file will give you all of them for the category table. But you also need to group them by homoglyph (probably with a hashtable), so I'd write a custom reader. If you don't get to it, I will eventually :) The two text files (IDN and confusables) would have to live inside Emacs somewhere and the reader will load them when it's loaded. LB> My impression is that IDN is a work in progress so it might be good LB> idea to read in the characters from a file if possible (and let the LB> user reread that file later if necessary). Re-reading the file is a really, really rare occurrence for the user so I would make it an internal function. You can always call it directly while developing, but end users will never need to. Ted