From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Thu, 04 Nov 2021 15:05:13 -0400 Message-ID: References: <87zgqmd5np.fsf@mat.ucm.es> <83wnlqk3rn.fsf@gnu.org> <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16904"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Eli Zaretskii , cpitclaudel@gmail.com, emacs-devel@gnu.org, stefan@marxist.se, yuri.v.khan@gmail.com To: Daniel Brooks Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Nov 04 20:07:16 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mii4y-0004Dk-Co for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 20:07:16 +0100 Original-Received: from localhost ([::1]:40536 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mii4w-0008VN-JM for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 15:07:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41756) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mii3B-0006ea-Vq for emacs-devel@gnu.org; Thu, 04 Nov 2021 15:05:26 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:60510) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mii38-0001Vs-JF; Thu, 04 Nov 2021 15:05:24 -0400 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 8972A80488; Thu, 4 Nov 2021 15:05:19 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 2FF4F800B3; Thu, 4 Nov 2021 15:05:16 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1636052716; bh=k6pcfQa2BIho69SOPrVvNSD1cFRaLmOp+aLQtArmsmE=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=Yfds8zRIorTwTdSFGRuhtw7J0Ana29eahVwvdADdA749knCnT0VbZ4Y1Z9ADMb/lt k4k6+JHYCOsKL0NVaXK5jGv5OFRxCs0/5KU4aWZ1rHJeDSZgE8Ndat0b4doyMAxVrq sCDselYpPEfR+vbDlYkk7QViDKFiTcDoxvJH+S5KOdPzg+VDgT6n3G4UqUl5O7edIh AlInIbmF4lBZW3C2fKVNUcjMLtIuDazN834wTnzR5aP96YcYs7+yVB6uK06lnUsyvw FI4MW0af7aaIY7TtWK36H7MDICgZwCGgNGHZ0+sKNZjujJAa8a2JElGJSJRV4t/cBb Q7s+2Bh/4HUtg== Original-Received: from alfajor (lechon.iro.umontreal.ca [132.204.27.242]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 22C89120304; Thu, 4 Nov 2021 15:05:16 -0400 (EDT) In-Reply-To: <87v918qx37.fsf@db48x.net> (Daniel Brooks's message of "Wed, 03 Nov 2021 23:00:28 -0700") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278704 Archived-At: > However, your suggestion of highlighting the text affected by the bidi > override characters while not actually showing those characters visibly > is not something that I would care to use. It shows that there may be a > problem without showing what the cause is. The cause is the presense of > certain characters, and I must be able to see those characters in order > to fix the problem, or even to judge whether there is a problem at > all. I don't think it's the case. AFAIK there are 3 steps: 1- Become aware of the presence of something suspicious, i.e. a chunk of text that may not mean what you think. 2- Be able to confirm whether this is what it looks like or not. 3- Find the root cause. Making the special control chars more visible can help at step 3 (tho not in all cases since the problem can occur without using any of those chars, as shown in my example code), but it's definitely not necessary for step 1 (where highlighting the text as Eli suggest might be more useful) nor for step 2 (where moving the cursor across the text is all it takes to figure out what it really means). Really, this is just another case of the "confusables": situations where different sequences of bytes can result in the exact same display (or maybe not 100% identical, but sufficiently similar that the untrained eye won't notice the difference) yet be treated differently by our tools. The main problem I see is that the definition of "normal" and "abnormal" depends on the programming language and even potentially to the human reading the text as well. For example, Imagine that the uppercase text below are written in a script&language that's RTL: My previous example had myfun (ARG1, ARG2) where the rendering displayed ARG2 to the left or ARG1, making it (presumably) confusing to the reader. But if the code says: days = [MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY] Which would be more confusing? To have first element displayed on the left or to have it displayed on the right? I think the answer strongly depends on the past experience of the reader, so there's a human factor at play. Stefan