From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: highlighting non-ASCII characters Date: Mon, 29 Mar 2010 16:05:29 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87aatqj0ti.fsf@lifelogs.com> References: <6932BBFEB09A4BA09156ED7F598569CE@us.oracle.com> <87pr2uv8e1.fsf@lifelogs.com> <87aatyuj9s.fsf@lifelogs.com> <87pr2rj89j.fsf@lifelogs.com> <87ljdeke5k.fsf@lifelogs.com> <87eij3ht2c.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1269897440 6395 80.91.229.12 (29 Mar 2010 21:17:20 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 29 Mar 2010 21:17:20 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Mar 29 23:17:16 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NwMKh-0004Ut-Ig for ged-emacs-devel@m.gmane.org; Mon, 29 Mar 2010 23:17:15 +0200 Original-Received: from localhost ([127.0.0.1]:47774 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwMKg-0006j9-VT for ged-emacs-devel@m.gmane.org; Mon, 29 Mar 2010 17:17:15 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NwMGM-0004PD-82 for emacs-devel@gnu.org; Mon, 29 Mar 2010 17:12:47 -0400 Original-Received: from [140.186.70.92] (port=55671 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NwMGH-0004J6-Fj for emacs-devel@gnu.org; Mon, 29 Mar 2010 17:12:42 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NwM9V-00015i-RD for emacs-devel@gnu.org; Mon, 29 Mar 2010 17:05:42 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:58494) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NwM9V-00015O-GM for emacs-devel@gnu.org; Mon, 29 Mar 2010 17:05:41 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1NwM9U-0007al-1D for emacs-devel@gnu.org; Mon, 29 Mar 2010 23:05:40 +0200 Original-Received: from 38.98.147.130 ([38.98.147.130]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 29 Mar 2010 23:05:40 +0200 Original-Received: from tzz by 38.98.147.130 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 29 Mar 2010 23:05:40 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 55 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 38.98.147.130 X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:FOkj3c2ndxz1W2KToEBoj5N00Js= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:122876 Archived-At: On Mon, 29 Mar 2010 16:19:07 -0400 Stefan Monnier wrote: >> Yes, probably. But that's accidental. I still think the character >> classes [:idn:] (revised name from before) and [:confusable:] (or >> [:homoglyph:]) would make sense as a first step, then we can decide how >> to highlight them. SM> The homoglyph data would be a useful starting point for the feature SM> I imagine, indeed. But from the message that started this thread, "K" SM> is a homoglyph, yet highlighting it everywhere doesn't sound like a good SM> idea, so basically we need to associate with each homoglyph char SM> a context where it is expected and only highlight it when it appears in SM> a different context (or maybe rather when it appears in the context of SM> its peer). (I had a "lightbulb moment" I should have had long ago: "confusable" is a character property, while "homoglyph" is a glyph property; thus the character class should be [:confusable:] and "homoglyph" should be used in the face name as long as it's not, er, confusing.) I know the goal is to match in context and I may take whitespace.el as a guide in this regard, but I have to start with a [:confusable:] character class. I'll also add a [:idn:] class as discussed. Is that OK or are you concerned about code bloat in regexp.c? Afterwards we can set up the map between each confusable character and the set of characters it can match; this is also in the data file. That lets us look in context and apply the rules I proposed. So for example if Cyrillic K is confusable with Roman K and we see Roman characters around, that's suspicious. But Cyrillic "zhe" is not confusable with any Roman characters so it wouldn't be as suspicious. On Mon, 29 Mar 2010 11:48:28 -0700 "Drew Adams" wrote: DA> But it occurred to me that besides different categories of such critters there DA> might be different levels of fontification details that users might want to see. DA> For example, for some users or for some purposes, it might be useful to see DA> different kinds of quote marks distinguished (e.g. different kinds of curly DA> quotes that might be homoglyphs or curly vs straight quotes, which are not DA> homoglyphs). For other users or for other purposes such highlighting would be a DA> distraction. I'll set up a flexible mechanism, probably patterned after whitespace.el, to do this kind of highlighting. So the users will be able to extend it if needed. I don't know about curly vs. straight quotes. I don't think that's a significant problem, whereas a Cyrillic K in Roman text can actually cause problems and security compromises. I'm not against the idea, I have just never seen it become an issue, and there's a million ways to combine quotation marks depending on the context. What's the specific case that you're thinking of? Ted