From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.emacs.devel Subject: Re: Unicode confusables and reordering characters considered harmful Date: Wed, 3 Nov 2021 13:41:19 +0100 Message-ID: <20211103124119.GB22552@tuxteam.de> References: <83fssejxf8.fsf@gnu.org> <835ytajsv2.fsf@gnu.org> <11d5fecb44af1d388b7f@heytings.org> <11d5fecb449846dc0851@heytings.org> <11d5fecb443892de13b1@heytings.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Yylu36WmvOXNoKYn" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11164"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/1.5.21 (2010-09-15) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Nov 03 13:43:43 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1miFcD-0002ht-Sl for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 13:43:41 +0100 Original-Received: from localhost ([::1]:46764 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1miFcC-0002UX-Ll for ged-emacs-devel@m.gmane-mx.org; Wed, 03 Nov 2021 08:43:40 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49780) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1miFaa-00010U-7E for emacs-devel@gnu.org; Wed, 03 Nov 2021 08:42:01 -0400 Original-Received: from mail.tuxteam.de ([5.199.139.25]:49823) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1miFaQ-00063Q-AK for emacs-devel@gnu.org; Wed, 03 Nov 2021 08:41:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:Date; bh=bivM2EDedn9bI3Fz3q0ZePOF4yv5FN0zYQ3gEbDvL18=; b=uLAoGtVDSgNSRQ/Dt0kEqsbT1zQy4tJqf+odXDbKYTyiV4umGSzM7x7Gg9aybYRHotD/Os//XemCLcumr6W7R4DZZyt/0+re3ifizKCF8RKd/7Qq5SkqLKzT8bT4jwpL4PxgnfPSQ9FLOJmUWUNq7QQmyZ0DO8BJlPSyrJTX2Puz6qNVv3FQIP9wkD4aMoY8vcqb2QDCGz1gWY6veUOXnmHjeeR+M9It2wxNczMfep88/fPa2e1na/wvGARFwyj7s7SkshsqbMXLx78e+LMqK6gDfQD4ngw0x7+oFrTvv/ws76rrGfPnpBJCZ49e8W9t7ixOMnaSS/UxcWwO7lvkkg==; Original-Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from ) id 1miFZv-000655-GD for emacs-devel@gnu.org; Wed, 03 Nov 2021 13:41:19 +0100 Content-Disposition: inline In-Reply-To: Received-SPF: pass client-ip=5.199.139.25; envelope-from=tomas@tuxteam.de; helo=mail.tuxteam.de X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:278570 Archived-At: --Yylu36WmvOXNoKYn Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Nov 03, 2021 at 08:20:01AM -0400, Stefan Monnier wrote: > > AFAIK, these specific characters are not necessary to write comments and > > strings in these languages. Here are two random file which use RTL st= rings > > and comments, and in which these characters are not used: >=20 > I was more worried about the fact that, while highlighting those chars > might be helpful to warn about accidental uses of them, if attackers > want to trick the reader, I'm pretty sure they can get similar results > without having to use those special LTR/RTL override chars: >=20 > int hi =3D 5; > int =D7=A9=D6=B8=D7=81=D7=9C=D7=95=D6=B9=D7=9D =3D hi; > int hello =3D 10; > int =D8=A7=D9=84=D8=B3=D9=91=D9=84=D8=A7=D9=85=D8=B9=D9=84=D9=8A=D9= =83 =3D hello; > myfun(=D7=A9=D6=B8=D7=81=D7=9C=D7=95=D6=B9=D7=9D ,=D8=A7=D9=84=D8=B3= =D9=91=D9=84=D8=A7=D9=85=D8=B9=D9=84=D9=8A=D9=83=D9=85) >=20 > There's no override here, but did I call `myfun` with args 5 and 10 or > did I call it with args 10 and 5? >=20 > [ OK, admittedly, for a bidi-idiot like me, it looks like neither since > the Arabic shaping of the two occurrences of the identifier actually lo= ok > different (and I truly have no clue why that is here), so I'm lead to > believe that the second is a reference to a non-existing > variable ;-) ] Most probably, yes. The second instance had one letter more, the "mim" (=D9= =85) at the end (which, for some funny reason, seems to have evaporated when my mailer quoted your message: in the above quote, they now /look/ equal, although when I copy/paste them, the mim re-appears. Go figure). As a full disclosure, I have to admit that I'm using mutt with vim as an editor (gah! :), so I chalk that up to differences between the viewer and the editor: it seems vim just hides that one). But you raise an interesting point: in an R to L stretch, is the order of the arguments also R to L, or L to R? Cheers - t --Yylu36WmvOXNoKYn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAmGCg28ACgkQBcgs9XrR2kZuDQCfbvT1YLTMWd7Di4vl9bWxgFOZ P9QAn3TR2B03sNepnJQ10/+55QyYslzp =Gzp4 -----END PGP SIGNATURE----- --Yylu36WmvOXNoKYn--