From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gregory Heytings Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Thu, 04 Nov 2021 10:41:41 +0000 Message-ID: <7699dbfaff0348867b72@heytings.org> References: <875ytag0hb.fsf@yahoo.com> <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <7699dbfaffc44df293f3@heytings.org> <83ee7wfe4p.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=us-ascii Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22147"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, stefan@marxist.se, yuri.v.khan@gmail.com, db48x@db48x.net, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Nov 04 11:48:54 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miaIf-0005TT-GY for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 11:48:53 +0100 Original-Received: from localhost ([::1]:52692 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miaIZ-0000S9-Ht for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 06:48:47 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56936) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miaBm-0008AU-Ou for emacs-devel@gnu.org; Thu, 04 Nov 2021 06:41:50 -0400 Original-Received: from heytings.org ([95.142.160.155]:52556) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miaBk-0007Gf-Hw; Thu, 04 Nov 2021 06:41:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heytings.org; s=20210101; t=1636022502; bh=zTt+32XKp2YL4KUr2ZK2EGSu4uFiC7neWaliXHhorB4=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:From; b=N/JhjmIsdmaJ1iaFZnHFO2VXY6MfRjcQJcmvW0KOpSgYTB1lepEmk9EklxGe3tITn nathzGSgp5sJWsWZ37W39fMGEHkwlqEWo7zzAxJHB26a1+lSSFB/FJ1C/Arqq/sQav UrlM+xOm3+UD7pGLv5/+3mRFXQB4EiS6tEcTMSgM9TIcGOYwgv2lHsNohKAmU0zgrj lUahTPARd48t5ZnXdr3msrMNZuKYLqcNF+r30Z314YGMzTvgCHXqMyz9v13TphzwEa BIItv1TXaKIYt/vplhFFF2ehLxS2oRl82ayIIzEjEtbk7waolydW+DKc/Cda0G0nKk oc5eoCXfZuf/w== In-Reply-To: <83ee7wfe4p.fsf@gnu.org> Received-SPF: pass client-ip=95.142.160.155; envelope-from=gregory@heytings.org; helo=heytings.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278671 Archived-At: >> If you could find an actual source code file in an actual project in >> which these characters are used with their intended purpose, it would >> be a pertinent example. > > Why do you need me to find an actual source code which uses those > controls? Isn't it clear that any human-readable text in comments and > strings in a program's source code can and will use these controls? How > does the tutorial text that explains technical stuff related to a > computer program differ from what a programmer could wish to write in a > comment or a string in his/her program? > >From a theoretical point of view, that's correct. From a practical point of view, if these controls characters are only found in 0.01% of the files that are hosted on, say, GitLab, and given that these controls can have a dangerous effect, it is reasonable for an editor to make them stand out. Just like Emacs makes no-break spaces stand out for example (although AFAIK they are not dangerous in any way), with a thin brown line. >> Otherwise it is safe and reasonable to assume (as the Rust developers >> did) that the mere presence of these characters in source code files is >> a potential problem and must be flagged as such. > > It's easy, that's sure. Reasonable it isn't. neither it's safe, > because any user who does want these characters used legitimately will > quickly turn off that warning for good. > > So it works for the Rust developers to tick a checkbox, but it isn't a > solution for the problem. > AFAIU the solutions you propose are: 1. Customize glyphless-char-display-control to display all control characters in a different way. This is a much cruder solution, it would also have an effect for example on ZWNJ which might be undesirable, and it is also not buffer-local. Users who want to use these characters legitimately are unlikely to use that solution. 2. Improve bidi-find-overridden-directionality to detect such non-legitimate cases. This has to be done. In comparison, the minor-mode exists, it's a small patch, and it's orthogonal to the two solutions you propose. Anyway, I think it is time to abandon all hope.