From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Gregory Heytings Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful, a simple solution Date: Thu, 04 Nov 2021 17:04:36 +0000 Message-ID: <7699dbfaff05e98d6338@heytings.org> References: <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <831r3yjqo9.fsf@gnu.org> <83v91aibe7.fsf@gnu.org> <87o872s0wf.fsf_-_@db48x.net> <83lf25gm1j.fsf@gnu.org> <83ee7xgio2.fsf@gnu.org> <87fssdrp54.fsf@db48x.net> <831r3xgfz3.fsf@gnu.org> <87v918qx37.fsf@db48x.net> <83o870fjqg.fsf@gnu.org> <7699dbfaffc44df293f3@heytings.org> <83ee7wfe4p.fsf@gnu.org> <7699dbfaff0348867b72@heytings.org> <83a6ikf9pw.fsf@gnu.org> <7699dbfaff090e4342a3@heytings.org> <838ry4f3xf.fsf@gnu.org> <7699dbfaffce8f3a1f41@heytings.org> <837ddng91f.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="HziunEXZzP" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36996"; mail-complaints-to="usenet@ciao.gmane.io" Cc: cpitclaudel@gmail.com, stefan@marxist.se, emacs-devel@gnu.org, db48x@db48x.net, monnier@iro.umontreal.ca, yuri.v.khan@gmail.com To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Thu Nov 04 18:05:51 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1migBT-0009KE-9L for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 18:05:51 +0100 Original-Received: from localhost ([::1]:49268 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1migBR-0007XC-7q for ged-emacs-devel@m.gmane-mx.org; Thu, 04 Nov 2021 13:05:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36832) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1migAM-0006ll-Kv for emacs-devel@gnu.org; Thu, 04 Nov 2021 13:04:43 -0400 Original-Received: from heytings.org ([95.142.160.155]:52938) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1migAJ-0001Ht-C8; Thu, 04 Nov 2021 13:04:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heytings.org; s=20210101; t=1636045476; bh=zyP9lKgXpaB90J+Oam4N8OamhFNcSU25Ibvd5ojiDBQ=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:From; b=wW0hTDl9mkcjO0Ynzj9+12OvezP7GHVaVGXHzgUBYUrK2ivgbhIrEQC+J4/4drddv P/q30JDy+z9IMfGyH4XyfJ3XL0vGOAjaUHXUoqIwZa+n458hiBQxfhc+mZe6R5TViy xCY7rwFN9feeUGXn3IAw53RpV6JldOlsIV461SKfb4o4G42WUvH2Uv1Or9ag31w/4V RfsjqkDsNtEHccUE8kXUCJWfN3yJmzDRAPd0tlZlEasFaKxoitxCMTKdP5WZjxNvup kWO9xylLsURzU3QzuiUhPe6TXJdJ30NtxzQnlIJUReyTuRrFI2ZU9D0cAfCusHqDZk 1HsmjcuxLVR9Q== In-Reply-To: <837ddng91f.fsf@gnu.org> Received-SPF: pass client-ip=95.142.160.155; envelope-from=gregory@heytings.org; helo=heytings.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278686 Archived-At: --HziunEXZzP Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable >>> But they don't. Not more than just using RTL characters within LTR=20 >>> text would. Just revisit the example posted by Stefan (which I=20 >>> slightly modified to be more realistic): >>> >>> myfun("=D7=A9=D6=B8=D7=81=D7=9C=D7=95=D6=B9=D7=9D" ,"=D8=A7=D9=84= =D8=B3=D9=91=D9=84=D8=A7=D9=85=D8=B9=D9=84=D9=8A=D9=83=D9=85"); >>> >>> Which string does this function call pass as the first argument, and=20 >>> which as the second one? >> >> There is no danger in that example, and in particular nothing=20 >> invisible. > > Ha-ha, very funny. > It wasn't supposed to be funny. >> The programmer must just be aware that compilers read source code files= =20 >> in byte order, which might be different from the order in which the=20 >> string is displayed on screen, but is identical to the order in which=20 >> one forward-char's through the string. > > If we are going to assume users forward-char through every piece of code= =20 > they look at, then the examples we were discussing present no problem,=20 > either. > I'm not assuming any of this. There are programmers who read Hebrew and=20 Arabic, and those who don't. Those who do know them know that they are=20 entered and read RTL, and don't even need to check the argument order.=20 Those who don't may not know this, and can easily check if they have some= =20 doubt about what string is passed in which argument. >> There is a danger when, because the source code contains invisible=20 >> control characters, the programmer sees something on their screen, and= =20 >> the compiler sees something completely different. > > That's exactly what happens in the above example. Except that=20 > reordering happens automatically without any invisible characters, i.e.= =20 > also "invisibly". > There are no invisible characters doing weird things with the text, no.=20 And it's those invisible characters that the "Trojan Source" paper is=20 about. Not potential interpretation problems by those who would discover= =20 RTL languages. --HziunEXZzP--