From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Thu, 04 Nov 2021 10:44:03 +0200 Organization: LINKOV.NET Message-ID: <865yt8qpik.fsf@mail.linkov.net> References: <875ytag0hb.fsf@yahoo.com> <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <86h7ctyupr.fsf@mail.linkov.net> <11d5fecb44ffbf6b7dd1@heytings.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17365"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (x86_64-pc-linux-gnu) Cc: cpitclaudel@gmail.com, stefan@marxist.se, emacs-devel@gnu.org, db48x@db48x.net, monnier@iro.umontreal.ca, Eli Zaretskii , Yuri Khan To: Gregory Heytings Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Nov 04 10:03:39 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miYep-0004Ng-8N for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 10:03:39 +0100 Original-Received: from localhost ([::1]:33818 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miYen-0003Nm-8F for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 05:03:37 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55270) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miYdy-0001l0-If for emacs-devel@gnu.org; Thu, 04 Nov 2021 05:02:46 -0400 Original-Received: from relay1-d.mail.gandi.net ([217.70.183.193]:57715) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miYdw-0003Ns-Dd; Thu, 04 Nov 2021 05:02:46 -0400 Original-Received: (Authenticated sender: juri@linkov.net) by relay1-d.mail.gandi.net (Postfix) with ESMTPSA id 8CE6624000B; Thu, 4 Nov 2021 09:02:34 +0000 (UTC) In-Reply-To: <11d5fecb44ffbf6b7dd1@heytings.org> (Gregory Heytings's message of "Wed, 03 Nov 2021 19:02:19 +0000") Received-SPF: pass client-ip=217.70.183.193; envelope-from=juri@linkov.net; helo=relay1-d.mail.gandi.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278660 Archived-At: >>> Anyway, if one wants to be able to highlight certain characters on >>> display, one could also use highlight-regexp, I think. >> >> Or markchars.el with markchars-what customized to markchars-confusables. > > Neither would work AFAICS, because these characters are > glyphless. Highlighting a glyphless character will not make it more > visible. Eli pointed out that instead of highlighting glyphless characters, only suspicious text between glyphless characters should be highlighted: For example, when a character with a strong left-to-right directionality has its directionality overridden to behave like right-to-left character, that is highly suspicious, because it makes no sense to do that in 99.99% of valid use cases. markchars.el has a rule that highlights adjacent characters from different scripts, so a new rule could be added that will highlight text that has no right-to-left characters between directionality switching characters.