From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Thu, 04 Nov 2021 18:50:04 +0200 Message-ID: <837ddng91f.fsf@gnu.org> References: <72dd5c2a-42c7-b12e-05ed-e93adbd89727@gmail.com> <83ilxajyhw.fsf@gnu.org> <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <7699dbfaffc44df293f3@heytings.org> <83ee7wfe4p.fsf@gnu.org> <7699dbfaff0348867b72@heytings.org> <83a6ikf9pw.fsf@gnu.org> <7699dbfaff090e4342a3@heytings.org> <838ry4f3xf.fsf@gnu.org> <7699dbfaffce8f3a1f41@heytings.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5111"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, stefan@marxist.se, yuri.v.khan@gmail.com, db48x@db48x.net, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Gregory Heytings Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Nov 04 17:52:09 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mifyB-00014O-8D for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 17:52:07 +0100 Original-Received: from localhost ([::1]:49518 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mifyA-00054G-5J for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 12:52:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:32878) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mifwE-0001jh-Ac for emacs-devel@gnu.org; Thu, 04 Nov 2021 12:50:06 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:48214) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mifwC-0005h2-0l; Thu, 04 Nov 2021 12:50:04 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=aLMpo2Tgd62zLNLNdeC6mOEYP0O4W/B7Qhv0gAGZqNo=; b=nBSXe++uz4hpTGMs6uWY YmtJu5qblvYuFzSK35e8sV8vQpVDnfvO3wdu6+kN3Q5771o3UDZHh0FELTUxQHqm08LanWRLuy3ul 41OuNCf7bYg9b2tUxhYFrNGlc61dG3pHPQlil+yY/BRzw3VyLH4VMBzakxahZyLAEhh3viRi9dUbo ODsA8roE6klciy6O3X6ncE6oHW4x4X9fzVhTZH2t6HwA+vqGaNOfMxypVBO86pkanNZUqRdHY9nco iqPfjP85Nt5Cd8tffmc6ccUssxsd7my8ys8pKzidNTrAqOBaa8/f/Jah9SuvhAGN7ZTNyB7RSzsgB VU2qS5eq4lpYhw==; Original-Received: from [87.69.77.57] (port=1707 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mifwB-0003Qz-Ic; Thu, 04 Nov 2021 12:50:03 -0400 In-Reply-To: <7699dbfaffce8f3a1f41@heytings.org> (message from Gregory Heytings on Thu, 04 Nov 2021 14:10:01 +0000) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278683 Archived-At: > Date: Thu, 04 Nov 2021 14:10:01 +0000 > From: Gregory Heytings > cc: cpitclaudel@gmail.com, stefan@marxist.se, emacs-devel@gnu.org, > db48x@db48x.net, monnier@iro.umontreal.ca, yuri.v.khan@gmail.com > > >> The answer is above: "given that these controls can have a dangerous > >> effect". > > > > But they don't. Not more than just using RTL characters within LTR text > > would. Just revisit the example posted by Stefan (which I slightly > > modified to be more realistic): > > > > myfun("שָׁלוֹם" ,"السّلامعليكم"); > > > > Which string does this function call pass as the first argument, and > > which as the second one? > > There is no danger in that example, and in particular nothing invisible. Ha-ha, very funny. > The programmer must just be aware that compilers read source code files in > byte order, which might be different from the order in which the string is > displayed on screen, but is identical to the order in which one > forward-char's through the string. If we are going to assume users forward-char through every piece of code they look at, then the examples we were discussing present no problem, either. > There is a danger when, because the source code contains invisible control > characters, the programmer sees something on their screen, and the > compiler sees something completely different. That's exactly what happens in the above example. Except that reordering happens automatically without any invisible characters, i.e. also "invisibly".