From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: Bidirectional text and URLs Date: Sat, 29 Nov 2014 12:14:38 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87h9xi2akh.fsf@lifelogs.com> References: <87a93cngwv.fsf@uwakimon.sk.tsukuba.ac.jp> <837fyfml31.fsf@gnu.org> <874mtio7wh.fsf@uwakimon.sk.tsukuba.ac.jp> <83r3wml8kq.fsf@gnu.org> Reply-To: emacs-devel@gnu.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1417281279 18954 80.91.229.3 (29 Nov 2014 17:14:39 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 29 Nov 2014 17:14:39 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 29 18:14:32 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XulbL-0004FH-LB for ged-emacs-devel@m.gmane.org; Sat, 29 Nov 2014 18:14:31 +0100 Original-Received: from localhost ([::1]:48309 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XulbL-0008Ny-4f for ged-emacs-devel@m.gmane.org; Sat, 29 Nov 2014 12:14:31 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48994) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XulbC-0008Mt-Lr for emacs-devel@gnu.org; Sat, 29 Nov 2014 12:14:28 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xulb6-0001B9-Pv for emacs-devel@gnu.org; Sat, 29 Nov 2014 12:14:22 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:48369) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xulb6-0001Ay-JG for emacs-devel@gnu.org; Sat, 29 Nov 2014 12:14:16 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Xulaz-000421-Q1 for emacs-devel@gnu.org; Sat, 29 Nov 2014 18:14:09 +0100 Original-Received: from c-98-229-61-72.hsd1.ma.comcast.net ([98.229.61.72]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 29 Nov 2014 18:14:09 +0100 Original-Received: from tzz by c-98-229-61-72.hsd1.ma.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 29 Nov 2014 18:14:09 +0100 X-Injected-Via-Gmane: http://gmane.org/ Mail-Followup-To: emacs-devel@gnu.org Original-Lines: 48 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-98-229-61-72.hsd1.ma.comcast.net X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" Mail-Copies-To: never User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (gnu/linux) Cancel-Lock: sha1:bN11pZqxb5M8fMwxdZFxk2/2ml0= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:178465 Archived-At: On Sat, 29 Nov 2014 10:22:45 +0200 Eli Zaretskii wrote: EZ> Once we decide which cases we want to avoid or flag, we could be smart EZ> there, by comparing the original and reordered strings, perhaps aided EZ> by some dictionary lookup. The infrastructure is either already there EZ> or easy to add. It's "just" a matter of deciding what to do and when. EZ> Someone(TM) should present a list of well-thought requirements, and we EZ> can take it from there. Well, here are the pieces I think will be useful for SHR and EWW. I don't claim they are well-thought :) Items 1-3 could be used through font-lock and just set some special text properties in the buffer in text modes that request it (so this will be an optional piece that is always available). Then themes and packages can add special highlighting or handling for those properties. 1) bring uni-confusables in the core. In regular expressions, support either a new syntax char class \s~ to mean "confusable" or a new character class [:confusable:] (or some other way to easily search for such characters, especially if they used outside of their native script). Possible text property: 'uni-confusable 2) in regular expressions, support a new character class [:unicodemeta:] for any characters that have meta meaning in Unicode and no printable representation, from bidi markers to composition. I'm not sure if that's already possible. That will allow packages to detect these characters in places where they are not expected, e.g. inside URL buttons. Possible text property: 'uni-meta 3) make it easy in the core to scan the buffer for places where scripts are mixed in a single sentence, string, word, symbol, etc. syntactic unit. markchars.el does that but only inside words. Possible text property: 'uni-mixedscripts 4) modify `browse-url' to intercept suspicious URLs where any of the above happened in the source buffer. I think the calling package will have to help set the context. I don't know if it can be automated... maybe the function could look for those special text properties around point in the buffer where it was invoked? 5) modify SHR/EWW to highlight these text properties and interrupt the user when the text or content of the URL button has them. Does that seem useful? Ted